Data extraction as a service - why and how?

Efficiency is a hot topic to be addressed by both IT (Information Technologies) and LoB (Line of Business) managers, but the outcomes can be tricky to measure. Organizations share unstructured data (images, files, emails, etc.) between them during day-to-day activities. But this information is not ready to be “consumed” by corporate systems (ERP, CRM, HRSI, legacy, tailor-made, etc.) and compiled for business decisions. Corporate systems are usually called “silos” because the only way to enter information is through form filling. Mapping metadata information trapped inside documents into systems and collecting signatures are the reason documents end up printed, and we still keep paper around…

Data capture/extraction and workflow – the accuracy commitment

There is a huge difference between the traditional and the ‘Data Extraction as a Service’ approach. The first one includes an on-premise package of software, hardware and people (internal or outsourced), but with a short-term commitment to data accuracy. The second one includes a cloud-based package of software, hardware, capture, classification and integration services (aggregates technology, people, processes, controls, compliance, and methodology) with a long-term commitment to data accuracy.

In the traditional model, the IT consulting team brings an image capture and extraction software to retrieve metadata from documents and export the semi-structured information into corporate systems. This configuration is based on sample documents and is optimized for specific scenarios. Clerk operators validate some misreads characters and manually index the correct data and end fallouts. When the transition is complete, the IT consulting team goes away and leaves a minimum of operators to deal with fallouts and everything looks fine. Or you think so…

Why does the traditional model fail?

The main issue is because this is much more than a technological problem and the reality is much more complex:

  • Semi-structured template management fallouts – these type of semi-structured documents (invoices, forms, purchase orders, credit notes) don’t have a common template. The content may be in several places. There isn’t a rule that can be applied in every document. There are best practices to extract the best quality, but it must always be built on an improvement framework. No magic pill works for all cases.
  • Hidden maintenance costs to ensure performance delivery – the consulting company installs a capture software prepared for structured or semi-structured documents. Perfect, right? Not so fast… An invoice layout is more or less the same thing, but in reality everything changes, there are many ifs and buts. What happens is that the initial team grabs “x” suppliers and sets a template for that scope and each one to extract data accurately. But there are hidden costs, because when the project ends there is a 90% efficiency (the information that the system can’t automatically extract is 10% and is manually indexed) and that percentage continuously decreases as time goes by and as new suppliers and document layouts enter the system. It is then necessary to acquire additional consulting packages to configure and add other templates, business rules, etc. But the process is not fluid, in reality, the company bought something that does not fit its business. Typically we want an SLA (Service Level Agreement) of 24/48 hours and not 1 or 2 months to something as critical as invoice processing. There are hidden costs that you don’t know at first. A continuous improvement approach is essential to keep the promised efficiency.
  • Hidden hardware/software costs – besides the capture software and the outsourcing team to manually index incorrect data, there are costs that will endear the whole project such as the acquisition of new servers (with the requirements to install the software) and additional software licenses (to use the framework in which the software is based).
  • No methodology or proper risk assessment for critical business activities: BCP/DRP – What happens if the scanner fails? Is there another as a contingency? It’s imperative to predict the risk because it is an operational framework for critical business processes, such as invoice processing or salaries. They can’t just stop. Regarding the Business Continuity Plan, if the server is down, is there software redundancy? There are deadlines to pay, there are fines, there are penalties, there are crucial services, and there are commercial discounts. Regarding the Disaster Recovery Plan, what happens if there is an accident/flood in the digital mailroom? The processes are so critical that need being covered when taking the decision to implement a solution. It’s not only a question of technology.
  • Records Management and Retention Schedules – Not only the transactional information is relevant in these processes. Records Management is necessary to manage documents as digital/physical assets. They must be stored during legal/regulatory periods and personal data, if exists, must be managed properly to avoid a data breach. Retention schedule rules are a mandatory activity for a proper corporate governance.
What is the solution for all of this? The complete outsourcing with a ‘Data Extraction as a Service’ model.

According to our 17-year experience, the most efficient and affordable way to ensure consistent data accuracy, 100% of control and 0% of risk is to outsource the entire model: cloud-based capture software + people to classify fallouts + business processes and rules + methodology + integration in the core systems. You pay a fee per document and that’s it.

Extraction as a Service - Ikea example

This is why you don’t buy a car at Ikea!

We are one of the few companies in the whole world to offer ‘Data Extraction as a Service’. Why so few? Because it is something real complex and suppliers are very reluctant to compromise and take all the risk on their side. You don’t buy a car at Ikea, right? It has too many parts. The logic is the same. Why we do it? Because we can guarantee the quality of the deliverable, which is the most important. We have the people, we have the software that learns from their input, we have the expertise. You just send the documents and everything is integrated into your core systems via web services or APIs (interoperability connectors).

What are the advantages of the ‘Data Extraction as a Service’ model?
  • Outsourcing specialization: OCR/ICR/Auto-classification – In OCR (computer-written data), documents arrive and are mapped into a library. In ICR (hand-written data), documents arrive, characters are isolated by filters and are mapped into a library. Then, a business logic is introduced to interpret and standardize information. There are also self-classification formats in which patterns are created.
  • Cloud/BPO/Automation/Robotics – As the model is based on the cloud, the process is per document and includes everything: infrastructure, licenses, extraction, BCP (Business Continuity Plan) and DRP (Disaster Recovery Plan). With BPO (Business Process Outsourcing), we ensure 99% of accuracy, because we have the people to validate what system fails, and we can monitor based on standards and alerts. The risk runs on our side to deliver the right data. In other models, the cost to ensure accuracy is paid as FTE, and the company has all the interest in selling more resources. No efficiency framework can be built, because the goal is to provide more people. Why automation? As a transactional model, we standardize and optimize processes. We classify what the system is, and then the information is entered through connectors. Robotics are scripts that define which clicks are made on the screen: the system records them and for that kind of clicks, it always performs a pre-defined action.
  • Commitment to delivery. Continuity of efficiency framework – the most important part. This model ensures the commitment is with all the metadata extracted. The risk is passed to the vendor. Since the model is transactional, as long as the expected metadata to extract is the same, the higher the volume, the lower the cost per unit. A real efficiency framework can be built independently of new suppliers or new document templates. You only need to submit documents to the platform and we will deliver the metadata into you systems.

When selecting a data capture solution, always ensure vendor commitment to deliverables (now and in the future), compliance in terms of retention schedules and records management, security regarding DRP and BCP, and iceberg/hidden costs. Don’t ever forget that we are talking about critical business processes. It’s not a purely technological decision, people and processes must be added to the equation.

The point is: what is the most efficient, affordable and long-term way to accurately put the information into the core systems and not depend on a specific hardware/machine? Now you know: ‘Data Extraction as a Service’.

– Check your data capture services efficiency and costs.
– Discover our Plans and Packages.
Contacts us and ask for a proposal to see the power of ‘Data Extraction as a Service’ (fee per document).
Subscribe to our blog to receive valuable insights about document management and paper-free processes.
Download the Free Industry Watch Report by AIIM and Papersoft “Paper-free progress: measuring outcomes” and see how paper-free is improving productivity, accessibility and compliance, the progress you’ve made and how your organization compares to others.

Paper-free progress: measuring outcomes