With the help of modern technologies, document processing is much faster and easier. Now you can automate the entire invoice-payment cycle so that the operator intervenes only in the key decision points (accepts the payment of the invoice and decides the day of payment), the rest of the process being performed by robots.
RPA (Robotic Process Automation) technology is a type of software that takes over the activity of a human being in performing a task in a process.
Using RPA tools, a company can configure software or a “robot” to capture and interpret applications or documents for processing a transaction, processing data, triggering responses, and communicating with other digital systems. RPA scenarios range from something as simple as generating an automated response to an email to deploying thousands of robots, each programmed to automate processes in an ERP system.
It can do repetitive things faster, more accurately than humans, freeing them to perform other tasks that require human strengths, such as emotional intelligence, reasoning, judgment, and customer interaction.
Optical Character Recognition (OCR) is the electronic identification and digital encoding of printed text using an optical scanner and specialized software. Using OCR software allows a computer to read still text images and turn them into searchable, editable data.
Widely used as a form of data entry from printed data records – whether identity documents, invoices, bank statements, computer receipts, business cards, mail, static data printing or any appropriate documentation – is a the usual method of digitizing printed texts so that they can be edited electronically, searched, stored more compactly, displayed online and used in automated processes such as cognitive calculation, machine translation, text-to-speech conversion, key data and text extraction.
Optical character recognition involves 3 steps
Opening and / or
scanning a document
in OCR software
Saving the document
produced by OCR in a
The image module
Upload and save images in formats such as BMP, PNG, TIFF, PDF and JPEG. Use JPEG2000 and JBIG2 compression (separate extension).
Possibility of preprocessing
Clean the original images, with features such as adaptive binarization, despeckle filters, deskew function, document rotation. Removing dark edges, removing lines, abandoning color are available in a separate extension.
Recognizes 137+ languages
Recognizes 137+ languages with different supplements: Asian, Hebrew, Arabic, bank fonts, ICR.
Our barcode recognition module is able to recognize popular 1D barcodes such as code 39, code 128, EAN, UPC. An additional extension allows the decoding of 2D barcodes for PDF417 recognition, QR code and data matrix.
Saving documents in several formats
The output formats of documents in IRIS OCR are:
PDF, PDF / A, HTML, XML, RTF, TXT, ODT, WordML, SpreadsheetML, CSV, DOCX, XLSX and XPS. An additional compression module generates compressed files using our iHQC technology in PDF and XPS.
– Zonal recognition
– Automatic page orientation recognition
– Automatic correction of the perspective of the images of the documents captured by the camera
– Capacities for automatic removal of drilling holes
– Add a separator as a blank page or barcode between each document to tell the OCR software to create different output files from a single batch of documents.
Italic handwriting cannot be recognized with OCR technology, because “optical character recognition” is set only for printed text.
Handwritten text can only be recognized if the characters are written separately (“hand-printed text”). This recognition scenario is called ICR and is most often used for:
-Zonal recognition (OCR, ICR)
– Form processing
What resolution must the image have?
We recommend that documents be scanned at a resolution of 300 dpi.
– For regular texts (font size 8-10 points) it is recommended to use the resolution of 300 dpi for OCR
– A lower resolution will lead to a degradation of quality and speed
– For font sizes less than 8 points, a resolution of 400-600 dpi is recommended
– Font size from 12 to 20 points is best for better quality and speed
Characters written on gray or colored backgrounds can lead to recognition errors, as this background makes it difficult to read characters. However, thanks to our state-of-the-art technology, colors are interpreted separately and can be removed in the recognition process if they have overlapping characters. Therefore, our recommendation is to scan in color if the documents have color areas. However, even if the documents are only black and white, we still recommend color scanning to maintain a smooth flow, because the speed difference between color scanning and black and white scanning is minimal.
IRIS – OCR Technology Provider
The name IRIS comes from Image Recognition Integrated Systems. At IRIS, we build software and products that help people increase their productivity while scanning and converting documents. We facilitate the scanning, editing and sharing of digital files.
IRIS Products & Technologies Division, part of the IRIS Group, develops technologies and products for intelligent document recognition and sells its portfolio worldwide through strong partnerships. The network of partners is one of the three pillars of the IRIS Products & Technologies Division, together with OEM partners and their own solutions.
Less paper, more content is our motto. As each motto is supposed to do, it summarizes what our solutions aim for. To tell a long story, I was one of the pioneers in the field of text recognition. We master it. Today we are pioneers in solutions that bring you the real value that your money deserves: the content of documents. Today’s gold is information, information is available on paper and in files; We manage the extraction of information from both to allow you to exploit the content in them at the cheapest cost and highest return on investment.
PolCo is a unique technology for identifying a compact area of information, developed by DigiSinergy , which uses a set of specific algorithms.
Based on reverse engineering of how documents are printed, it uses AI mechanisms to identify areas that contain correlated data, just as the human brain reads. All words based on similar characters and fonts are combined in certain groups / areas. At the same time, PolCo can use user-defined polygonal shapes to group certain areas of the document, even if they are not adjacent, thus becoming a much more powerful identification tool than the human eye.
Using PolCo a document can be immediately divided into independent areas and each area can be processed specifically. PolCo manages to accurately identify areas that contain information, without mixing information with different meanings or from adjacent areas.
Separates the document into independent areas
Identify areas based on how they are written
Identify groups within areas defined with ADD technology
Polygonal identification of similar areas and their processing as a unitary whole
Adaptive Document Data Recognition
ADD is a unique document recognition technology developed by DigiSinergy that uses a set of specific algorithms. Based on the form of the document and using short learning processes, easily performed by the operator, ADD extrapolates the information and adapts to the new situations it encounters, managing to correctly identify the information.
With information structured in various areas specific to a document (header, body, footer, etc.) ADD recognizes each area and extracts the necessary information. Together with Polco technology, ADD adapts to multi-page documents, even if they have different shapes, managing to select the necessary information. It has excellent results in reading information lines in documents, even if they have layout differences between them. For ADD, switching from one page to another does not create problems, even if the pages do not have the same layout. ADD eliminates all residues that do not contain information, adapting to each page and even to the lines of information that are on several pages.
Processes multi-page documents that have pages of different shapes
Processes documents that contain characters written in various fonts of different sizes
It processes a block of information even if it is written on several pages
Ignores areas that do not have important information, even if they intersect with the information areas
Blue Machine technology
BlueMachine is a unique technology, based on Machine Learning and Artificial Intelligence, developed by DigiSinergy , which can add certain information contained in a list to a data set extracted from a document (or existing in any other way), using a set of specific algorithms.
Blue Machine is an AI machine that learns from the operator or from the history, later taking over the operator's tasks, adding necessary fields to each line of information. He learns from each intervention of the operator, he warns the operator when he changes his behavior, but he acquires his new behavior at the indication of the operator.
Thus BlueMachine can be used in streams with two levels of competence, level 1 and level 2, and if the information entered by level 1 is corrected by level 2, the next erroneous entry by level 1, BlueMachine will signal the operator, thus eliminating a the new mistake. BlueMachine will in turn teach the level 1 operator to enter the data correctly in accordance with the recommendations of the level 2 operator. Thus, accurate data are obtained, with much lower degrees of error than in the case of the human operator.
SmartCorrect is a set of specific algorithms, developed by DigiSinergy , used to identify errors in a document. In general, these errors occur as a result of the OCR process, but they can still occur even when creating the document.
Errors are identified both on the basis of document-specific features and on the basis of general rules. These can be defined both within the SmartCorrect module and by the operator, within the use of the application. In case of an error it will be intuitively signaled to the user.