Suggested
An in-depth Guide to Automated Invoice Scanning Software
Automated Invoice Processing, a key back-office task that can lead to a great deal of time & cost savings if automated correctly.
In this article, we will dive deep into automated invoice processing, a key back office task that can lead to great deal of time and cost savings if automated correctly. We will look at how invoice scanning and data capture solutions work and different solutions out there in the market along with factors to consider when choosing a vendor.
Most businesses, these days, receive invoices from vendors and suppliers digitally (PDF files, scanned images, photos send over email). In order to make payments, the data from these invoices needs to be extracted and matched against purchase orders (PO based invoices) or checked against received goods (non PO based invoices). In this whole process, Invoice capture is the step of extracting structured data from invoices so that invoices can be automatically processed. As such, invoice capture is the first back office process to be automated with AI for most companies.
An invoice scanning and data capture software automates the mundane task of manual data entry. It tries to recognize all key value pairs and line items in your invoices and returns easy to handle structured data such in JSON, CSV or XML formats. Once your PDF invoices are converted into structured data, you can easily use the data in your other applications such as accounting and ERP systems. There are several advantages to automating invoice processing for a business:
There are two main kinds of invoice capture software, namely, template based and machine learning based. The key difference approaches is how they extract data from invoices.

For the majority of companies, the number of vendors is limited (less than 500) and 80% of the invoices come from a relatively small set of vendors. When the format of the invoice is known, it is relatively easy to train an OCR solution to extract data.
You only need to process a couple of invoices per vendor for training the software to be able to extract the data from invoice afterwards. Since the format for a particular vendor doesn't change very often, this makes the system highly robust and very accurate.
In most situations, you will have invoices coming in from a long tail of vendors with varying formats and invoice data. In such cases, it is necessary to use a machine learning based solution that can detect key value pairs and tables from unknown layouts.
If you happen to have a wide variety of vendors, it becomes important to train the software on your dataset. Most invoice data extraction come with a pre-trained model, but you can get much higher accuracy by training on your data set.
Software such as this one (Docsumo) combine best of the both worlds and 'remember' vendor formats without the user specifying so and default to a machine learning based algorithm when a new vendor invoice is detected.
This means that you don't need to create templates for each vendor and the software will create them for you in the background as you start processing invoices. Continuous machine learning based solutions really improve data extraction accuracy within a short period of time once you start using them.
Below is the list of companies that provide an invoice capture software.
Automated invoice data capture is still a problem that has not been fully solved. Since data types in invoices (invoice number, taxes, warehouse details, shipping details), the representation of this data ("Invoice No.", "Invoice #", "invoice number"), and the format of the invoices varies a lot, computer software have a hard time in achieving 100% accuracy in data extraction. Though machine learning techniques are evolving rapidly, capturing line items from multiple pages is still challenging.
So how much accuracy can you expect from invoice capture software? In short, it really depends. For really clean and a narrow variety of invoices, you can get between 95% to 99% accuracy. In most practical situations, expect an accuracy between 80% and 95%. The only way to know for sure, is to use one such software and see how it works for your dataset.
A couple of things to consider while measuring accuracy:
When choosing a vendor check for the following things:
Choose a vendor whose data privacy policy is in line with your company policies. More often than not this can be a show stopper if your company policies do not allow the use of external APIs for processing invoices. Also, check with the vendor how long do they store your data. In some cases, your company would need to keep the data for an extended period, while in other cases, the data might need to be deleted after processing.
As no software is perfect, it is recommended to check the data extraction accuracy delivered by the software. If you need to process thousands of invoices, it might make sense to do a pilot to check the software before purchasing.
Most invoicing software charge per document processed and a setup fee if you have special integration requirements. You can compare different providers based on pricing if everything else is equal.
Check how the software learns from your invoice data. Best softwares (eg. Docsumo) 'remember' how you extracted data for a particular invoice and also learn using machine learning across all samples.
Since your office staff will be using the software, it is important to check how easy it is to use the software and whether making minor modifications to the extracted data is convenient.
Most software have a human in the loop in case of false positives (eg. wrongly extracting purchase order number as invoice number). Check if the invoice capture vendor provides a data extraction service in addition to the software. This can lead to a completely automated solution for you, rather than validating the extracted data inhouse.
Since the invoice data would be consumed by a different software, you can ask the vendor about integration options. Most software such as Docsumo integrate directly using API or provide a CSV/Excel download option.
You can check if the vendor has good reviews online and case studies from other customers in your industry. This can help you understand the company background & help with the vendor selection.
Electronic Data Interchange or EDI specifies standards by which businesses can exchange data. Since the data is exchanged using XML format, it is directly processed by the receiving software without the need of human intervention.
However, this requires that businesses at both ends use the same standard for data exchange. If you have a few really large customers who invoice regularly, you can look into this EDI. In most cases, EDI is not feasible since even if you have a few vendors who send PDF files or paper documents periodically, you will need another system to process those invoices.
As we have seen in this article, automating invoice processing is very much possible provided you are aware of the current technology, are able to define your use case properly and choose the right vendor.
Hope this article gives a good picture of invoice capture software market and helps you make a decision. We at Docsumo have built a document data extraction software just for this purpose. Why not give us a try? Schedule a demo with us and find out how we can add value to your system.