Home > Resources > Blogs >Can AI Revolutionize Quotation Automation in Direct Procurement?

Can AI Revolutionize Quotation Automation in Direct Procurement?

Business | July 13, 2022 | By zumen Ml Quotation Process

This is the first of a multi-part series on how Zumen developed a feature to extract information out of quotations and invoices. Upcoming parts will include the depth of technology used, the importance of this technology in the industry, and how it can solve business problems.
If you’re new to supply chain and direct material procurement, don’t worry. Our two Ultimate guides provide in-depth explanations of direct procurement.

  1. Direct Material Procurement Cycle Guide – Part 1: Detailed explanation of the Source-to-Contract process.
  2. Direct Material Procurement Cycle Guide – Part 2: Detailed explanation of the Procure-to-Pay process.

Before moving on with the topic, let us consider a real-time scenario. Take a Medium to Large scale product manufacturing Industry. They deal with multiple parts everyday. The number of parts may range from several 10s to 1000s based on the complexity of the product. In the New Product Development process, after finalizing the parts that will be outsourced, buyers request quotations for those parts in their Bill of Material (BOM), from different suppliers. Let us say that for one part, 3 suppliers are considered. Quotations are received from the 3 suppliers in the form of emails. Multiply this number with the number of parts outsourced. The multiplied number becomes quite staggering based on the product’s complexity. Although there are a number of procurement software’s available in the market, buyers find it difficult to manage and manually bring the data into their system. Quotations contain critical information and buyers need to analyze each one of them to make important business decisions. It is a time-consuming process where they spend most of their time transferring the information from documents into spreadsheets. Processing 100s of quotations every day is a nightmare.

This problem raises a series of questions.

Are there products or services that can reduce this manual effort?
Can we make a digital system to read and understand documents?
Can we extract information from documents seamlessly?
Can machines really understand documents with different layouts?

If we can get machines to read and understand these documents, this would bring huge operational efficiency to an organization. These documents are usually present in emails where most of the negotiations happen. We can use technology to crunch data inside the documents and give us the optimal solution but all of this data should be readily available to us.

For machines to analyze documents, they should be converted to an acceptable format. We need these images or paper documents translated into digital and editable formats.
This is where AI plays an important role.

How to Teach Machines to Read?

It is a fundamental challenge for machines to understand contents inside a document. Information is present in the form of characters, these characters can be alphabets, numeric or special characters. A character does not contain any instruction for the computer. For an image document, a computer stores the ink patterns in the memory. These patterns are the characters which are captured as pixels which are nothing but intensity values. The technology that processes these pixel data and finds the match with alphabets and numbers is called Optical Character Recognition or OCR. But it is only one piece of the puzzle, before we can get to OCR we have to understand the layout in the document.
Document understanding technology can be broken down into multiple problems:

  • What are the locations of different structures in the document?
  • How to classify these structures into tables, forms, sentences etc.?
  • How do we extract the identified structures to reconstruct and arrange them into readable and editable form for users?

In this blog we will look at different types of structures in documents and in the next blog we will look at the difficulty in reconstructing them and how ML engineers at Zumen solved it.

Structured Text:

An enterprise quotation and invoice can be a complex document with a lot of text and nested tables. When the document is perused for decision making such as, comparing quotations to select the best fit vendors or reviewing invoices to be sent for payment, a lot of data is presented in structured text. An example of structured data is a form, with fields and values.
An industrial document can have a lot of fields in it. Even we can find such documents, especially detailed ones, difficult to interpret. Let us rewind to the historical moment that happened at the end of WW2, after the Japanese surrender.

As the Canadian representative mistakenly signed in the wrong field, the other delegates had to sign in the next available field though it was incorrect. The Japanese delegation protested resulting in a slight tension build-up until US Chief of Staff General Richard Sutherland scratched the incorrect list and hand wrote the correct titles under each signature adding his initials to each correction. Now if a machine is taught to understand any layout by training over it, it is less likely to make such mistakes.

Headers, footers, numbered lists, bullet points, paragraphs all belong to unstructured data. It can be straightforward for us to understand but machines require natural language capabilities to make sense of this type of data. Terms and conditions are a good example for unstructured text.

Unstructured text
Understanding the context and key information from this unstructured text can be of great use for integrating them quickly with backend systems. We can also perform tasks like Document Visual Question Answering or VQA. VQA is a machine’s ability to answer a question given an image. If this image is a document, it is Doc VQA. If we have to find the supplier contact information from the above invoice terms and conditions example, running a query directly on the document can save a lot of time. This will be particularly beneficial to quickly fetch results from unstructured text.


A table is a compact arrangement of information in rows and columns that assists users with easy understanding of findings and comparisons. They come in various shapes and sizes. Tables are quite literally the key structures in which companies communicate. All companies do not follow the same format for their tables. If a standard template with a properly defined grid is followed, tables might have been a slightly easier problem to solve. But given the variety of shapes and border definitions it comes in, it becomes a harder problem. Here is an example of a complex table structure,

Extracting tables from documents is a multi-step problem. First we have to find the table region and the number of rows and columns. We also need to find the words inside each cell of the table and arrange them in the right structure format. There has been significant interest in this topic in the deep learning community. Many crowd-sourced datasets have been made public to help researchers come up with better solutions. This is the problem we set out to solve for quotation and invoice processing.

How did we solve it ?

What is the current state of accuracy and latency?

What are the pros and cons?
Stay tuned for our next part to learn more about the solution we designed with deep learning.

Leave a Reply

Your email address will not be published.