Best in class for capturing data from financial documents
“We are using Docsumo’s APIs for automating data capture from bank statements and identity cards while on-boarding customers. It has reduced the time our operations team spends on data entry by manifolds while providing a much better customer experience.”
About the customer
The case study: In a nutshell
Process unstructured id & income verification documents
- Payu collects identity proof, address proof and income proof from each customer for onboarding.
Identify & classify documents
- Payu needs to classify 7 different document types for each applications and queued for manual data extraction
- Data to extract includes id and income verification details, tax, and transaction details
Capture data from bank statements with 100+ layouts from 100+ banks
- Not only did the structures vary for different bank statements but the position of data to capture varies for these documents
- Some of them were in tabular formats.
Categorize & derive attributes from extracted data
- The manual extraction lacked a logical validation of payment and trasaction details.
The Docsumo Solution
Ingesting id & income verification documents
- API-based direct integration that seamlessly ingests Bank Statements, Checks, Passport, Driving License, Voter ID, National ID (Aadhaar), and Utility Bills onto Docsumo.
Pre-processing and getting ready for data extraction
- Inbuilt document pre-processors identified the letter formats (JPG, PDF, PNG etc.) and queued them up for data extraction.
Data extraction from unstructured text
- Docsumo's OCR module used the vectorized position reference in a letter to extract data.
- The OCR not only parsed through letters with varying fonts, layouts, image quality, and resolution; it even extracted data from the tables with 95%+ accuracy.
Intelligent categorization of key value pairs
- Our proprietary NLP-based classification framework started rapidly learning from all the documents. It was trained to categorize key value pairs and line items.
- Another algorithm started making intelligent predictions to identify the data within a letter.
Rule-based data validation
- Once the data is extracted, a rule-based validation engine applied contextual data validation and correction algorithms.
Integration with downstream software
- The data was extracted in a JSON format that was easily integrated into NDR's Salesforce instance via APIs and iframe.
Result: 99%+ Data extraction accuracy
Fill up the form to speak with an automation expert.