The value of Robotic Process Automation on unstructured document classification tasks, and how to start

Robotic Process Automation (RPA) is no longer science fiction. RPA is a reality and is gaining the interest of several organizations that want to digitize their operations (procure-to-pay, quote-to-cash, human resources administration, claims processing, remittances, mortgages and other back-office processes) and create a laborless, more efficient and paperless content management environment.

These organizations wish to be at the forefront, reduce administrative costs and focus on tasks that really matter and create value. They are already making RPA as part of their document classification automation strategy, taking advantage of a solution that replicates human being actions.

We are not talking about future predictions. We are talking about an emerging way of complex and routine operations fully automated based on a digital and virtual workforce of software robots, or artificial intelligence (AI), that is changing the way people and organizations work.

Although this is a huge advance, the fact is that for some organizations the digital workforce platform still is an implausible dream. Not only because they aren’t using fast-moving RPA technology, but also because they aren’t aware of new digital techniques for extracting value from the unstructured data they deal every day. Images, files, emails, forms, photos, among others, are not being automatically integrated into core systems and still require human intervention for the right classification, transformation, and validation.

RPA allows software “robots” to automate this mundane, labour-intensive activities carried out by humans such as repetitive, time-consuming and prone-to-error manual data entry. Because these activities don’t bring value to the customer (they are basically very time-consuming costs), it’s mandatory to transform them into more important initiatives, giving time to employees for more important and valued tasks.

Robotics applied to advanced document-based classification, and data extraction is here and ready to make the dream a reality.


What is Robotic Process Automation

The Institute for Robotic Process Automation defines RPA as “the application of technology that allows employees in a company to configure computer software or a ‘robot’ to capture and interpret existing applications for processing a transaction, manipulating data, triggering responses and communicating with other digital systems.”

RPA is positioned on top of an organization’s existing technology, meaning that it is both complementary to core systems and non-disruptive for day-to-day business.


Information ready to be consumed – RPA at the entry point

Numbers confirm this dream: according to a study by PMG IT, 98% of respondents view process automation as vital to driving business benefits in today’s corporate environment.

How? By automating the integration of unstructured data into core systems. In other words, by starting automation at the entry point, at the first document processing stage, and giving robots the information ready to be consumed! This is exactly what RPA is already enabling: the conversion of unstructured data into workable data, correctly classified, processed, validated, compared and converted into valuable information… transformed into real knowledge. All of this is automated and doesn’t need human intervention. Only an eventual and extraordinary exception, what comes out of the normal (fallout), is highlighted to a human, as an alert, who will then validate it and return with a solution. The system learns with this human interaction on untypical processes. The next time, it will be able to perform the same way and move from 99,99% accuracy to 100% accuracy making it just one more automated task, together with the usual ones. It replicates what the human does, the steps, and the clicks.


Giving chaos a meaning and automatically transforming unstructured into structured data

The magic is empowering unstructured data with Robotic Process Automation. It would be easy if the world’s information were based on structured data like questionnaires, ballots, etc. or even semi-structured documents like invoicing or shipping orders, but that’s not the reality. Complex working processes and siloed data rules the world. But there is a solution: RPA, that is here to help you organize information chaos, not only by automatically extracting data from unstructured documents like letters, attachment files and contracts but also by starting a call to action or a business order, validating, organizing and delivering data to a digital workforce that inserts it into the systems. In order to gain competitive advantage, you need to efficiently automate processes and people, as a whole. If you need lots of people to prepare information for robots, what’s the benefit?


How to start – Robotic Process Automation in action

The first Kilometre is the hardest! Managing critical processes at the entry point is vital to increase the productivity of operational teams, reduce costs, and create knowledge. So that’s where you should start implementing RPA solutions and focus your digitizing efforts. Right at the beginning, with a tool that covers the whole information treatment process and prepares data for Robotic Process Automation. A tool that helps you acquire data, recognize and understand documents, classify information, process texts and documents, search and distil information.


OCR (machine print recognition) and ICR (handprint recognition)

Typical business documents like product registration/purchase or service subscription forms (surveys, banking forms, etc.) are structured. These forms follow some general layout patterns so that rules can be defined/applied concerning where to look for certain pieces of information. These structured docs have a defined geometric region (template) for each piece of data and “theoretically” are easier to work using document recognition tools like OBR (bar code recognition) for document separation, identification or sorting, standard OCR or OMR (checkmark recognition). Even Semi-structured documents like invoices or delivery notes respond to some specific layouts where usually the most common information to extract is placed, like a header or a footer. Even the table describing the purchased items (table line items) is usually placed in a more easily configurable area. When we are talking about unstructured documents, challenges are quite different. A pattern doesn’t exist, and we are not able to work with pre-defined templates. Advanced image, text, document recognition, specific fault tolerance features and even applied linguistics are mandatory and bring a new complexity to the digital transformation process.

Automated recognition

Extracting data from a business document typically involves several steps. In the first step, a text recognition tool like OCR is used to turn the pixels into individual characters. In the second step, meaningful units like amounts, references, dates, text or numbers are identified. In step three, the most plausible hypothesis for the information searched is identified, typically based on contextual information. Step four is normalizing the different writing styles for the information, and the last step is the logical validation. Although depicted as a sequence, the steps really run in cycles, following many hypothesizes in parallel.


Having the right and ready to be consumed information, RPA replaces labor-intensive, multi-step tasks across multiple systems and data sources, including:

  • Customer or employee onboarding.
  • Regulatory compliance reporting.
  • Order scheduling & tracking of shipments.
  • Loan application opening.
  • Supply chain management.
  • Insurance claims handling.
  • Financial account aggregation.


Here are four ideas you should retain on Robotic Process Automation:

  • It enriches and complements, rather than replaces, existing systems.
  • It liberates employees from repetitive, non-valuable tasks.
  • It is only possible to implement with structured data (first, you must convert unstructured to structured data).
  • It eliminates human error, and it learns with human behavior, acting and performing the same way, every time.

To Dos:
– Would you like to see RPA in action? Get in touch!
– Know more about Papersoft Auto Reconciliation services.
Subscribe to our blog to receive valuable insights about knowledge management and paper-free processes.
Download the Free eBook “Digital Transformation of Document Workflows” from IDC and Papersoft and discover what you need to becoming more agile in the current digital economy.

digital transformation document workflows articles