Many companies have modernized their backend systems and digitized their data, but systems which interface with the user are the slowest and most difficult to modernize because of third-party integrations, processes, etc. and most importantly the inherent opposition of humans to change. Paper forms with handwritten data fall in this category where it might be difficult to get away from them in the short term and can become a bottleneck in the company’s path for Digital Transformation and reap the full benefits of the same. This has given rise to many Automatic Document Processing Systems and Azure Form Recognizer aids in this process by performing text extraction and packaging it into usable data formats.
These paper forms are typically read by humans and data is keyed into systems. This process is not only error prone but also not scalable. It is not equipped to handle increase in volume over time. Hence there is a need to get away from this laborious process. But until it happens, Form Recognizer can help perform this data entry in a more scalable fashion.
What is Azure Form Recognizer?
Azure Cognitive Services Form Recognizer is a cloud service that uses machine learning to analyze text and structured data from documents. It provides that information with proper context in the form of JSON formatted results, containing key-value pairs of numbers, text, paragraphs and tables, checkboxes, etc. This allows us to focus on how to use the data in the onward systems.
The Form Recognizer can detect multiple languages in a document. It can recognize over 160 languages of printed text and few languages of handwritten text like English, Japanese, Chinese Simplified, Korean, French, Portuguese, German, Spanish, and Italian.
In terms of document formats, it can recognize and automatically extract data intelligently from various types of Receipts, Invoices, and IDs. It can also extract data from any document which cannot be categorized as above and provide results in a generic format still very easy to consume for downstream systems. And you can also build a custom model to process your specific set of forms.
How to use Azure Form Recognizer in you document processing workflow?
At a very high level, documents are scanned (converted to images) and fed to the Form Recognizer through a wrapper application. The Form Recognizer extracts the text and returns the information as a well formatted JSON document to the caller, who will process the data and then persist it in a data store for other systems.
Azure Form Recognizer Models
Form Recognizer provides the following types of models:
- Read OCR model provides just the printed and handwritten text information.
- Layout Analysis model provides document structure information in addition to the above.
- General Document model provides document data as key-value pairs in addition to the above.
- Prebuilt models process forms which belong to common document types like receipts, invoices, as well as vaccine, insurance, and business cards.
- Custom models process forms which are your business specific.
The first 3 models provide low level information in their output and may require significant processing to make the data ready for consumption. The last two options have a more intelligent output which makes it easier to consume the same. The Custom models option will allow you to get very intelligent output using pre-defined keys in the key-value pairs.
Development choices and tools
Developers can work with Form Recognizer across all the major platforms (Windows, MacOS, Linux, Docker and JavaScript on all browsers) either using the REST API interface or by using the SDK.
The implementation process is as follows.
- The client application will capture the form in the form of image or PDF and convert it to Base64 and post to the Form Recognizer endpoint. The POST call returns a URL, which the client will poll to check for results.
- The polling calls result in a response with status as “running”. When the status changes to “succeeded”, the processing is complete and the client receives the results in the response payload.
This POST and GET (polling) calls can be made using the SDK interface or REST API interface. These interfaces take the model name as a parameter while making the call. For Custom models, you need to first create the model and use the custom model name in the calls. Microsoft has provided a companion tool call the Form Recognizer Studio to create the custom model. To create and train the model, you need at least 5 samples of the same form. You may need few more forms if the accuracy and confidence is not good or if there are mild variations in the form structure. If you want the custom model to handle multiple form types, then create multiple custom models and use them together by creating a single composite model. The Form Recognizer Studio provides an excellent environment to create and test models before employing them into your development cycle.
Success Criteria
Confidence versus Accuracy
For every value that is extracted by the Form Recognizer, it provides a confidence level, a value between 0 and 1. Confidence measures the ability of the engine to read the value and does not have anything to do with the accuracy of the value. Accuracy defines the number of exact values from among the data extracted. Depending on the context “values” may be an individual character or a word or a sentence. But typically, if you get 9 items out of 12 items correct, then you would say the extraction accuracy is 75%. Again, depending on the context, the accuracy definition could differ. Finally, Accuracy is what determines the success of your solution.
For the pre-built models (receipts, invoices, etc.) there isn’t much you can do to improve accuracy, because Microsoft’s Product Development team ensures that for you. So, if you are not satisfied with the accuracy, it might help to get in touch with Microsoft Support who can help you overcome your limitations.
For custom models which you build for your forms, you can improve the outcome by using the confidence and accuracy metrics to tune your model.
- If the Accuracy and Confidence are both high, then you have a well-balanced model.
- If the Accuracy is high but the confidence is low, then typically it points to a mismatch between samples used to train model compared to the tested sample. Try retraining the model with samples more like the test sample.
- If both Accuracy and Confidence are low, we will have a relook at the training samples and segregate them to create multiple separate models and use the composed model.
- If the Accuracy is low and the confidence is high, highly unlikely combination.
Refer Accuracy and confidence scores for custom models for more details
Performance
A typical call to the Form Recognizer service largely returns the results in about 5-8 secs. This makes it very appropriate for batch processing or backend processing or when forms are submitted by email, etc. But if you expect to use it for realtime or near realtime processing like when submitting the forms through the browser upload or mobile app photo, then you may have to look at the entire workflow and come up with alternate mechanisms to keep the user engaged, like intermediate updates in browser or mobile notifications.
Costs
For a small number of pages per month, the Pay-as-you-go model would work. But for an enterprise the Commitment Tier pricing options (upfront monthly payment at a discount) are better due to the volume of pages processed. And then again, based on the agreement you have with Microsoft, there might be some other pricing options available to you.
Microsoft charges you by “page”, which means that if your document contains 5 pages, then you will be charged 5 times the per page rate to process that document. The results are available per document.
The rates also vary by the type of model being used. The custom models are charged about 3-5 times more than the pre-built models. So, it is worth exploring if you can modify the forms so that they can use the pre-built models.
Conclusion
The Form Recognizer is an excellent tool to extract data from a form. It can perform better with a little help from us by modifying our forms to make them OCR friendly. Good handwriting and good scans can improve the confidence, accuracy, and net result. However, one should try to eliminate these paper forms with a totally unrelated process. But until then, Form recognizer can get you started on your Digital Transformation journey. The data will still need human oversight. As with any AI technology results are probabilistic. But, human involvement will be by exception instead of by norm.
How can WinWire help?
WinWire can help you automate the paper forms ingestion process. We can make the forms OCR-friendly, prepare AI models and fine tune them and even post-process data to improve their accuracy. At the very least, Form Recognizer can help with data entry. There are endless opportunities which appear when data is available in digital form. We can help with realizing these opportunities which never surfaced due to human intervention before.
WinWire recently supported one of the largest and diversified U.S. based construction company to drive operational excellence, elevate staff experience by digitizing the physical forms using Azure Form Recognizer.
References
Automate document processing by using Azure Form Recognizer – Microsoft