OCR (Optical Character Recognition) is a software that enables a computer to read static images of text and convert them into editable data that can be searched. OCR involves the electronic identification and digital encoding of printed text using an optical scanner and specialized software.
It is a common method for digitizing printed text, making it electronically editable, searchable, more compact for storage, displayed online, and used in automated processes such as cognitive computing, automatic translation, text-to-speech conversion, key data extraction, or text extraction.
OCR is widely used for converting data from paper-based records (whether they are identity documents, invoices, bank statements, computerized receipts, business cards, mail, or any relevant documentation) into printed text data.
Opening and scanning a document in the OCR software
Recognizing the document within the software
Saving the document produced by the software in a format of your choice
Output document formats in IRIS OCR include PDF, PDF/A, HTML, XML, RTF, TXT, ODT, WordML, SpreadsheetML, CSV, DOCX, XLSX, and XPS. An additional compression module generates compressed files using our iHQC technology in PDF and XPS.
OCR technology has the ability of zone recognition and automatically recognizes page orientation. It will automatically correct the perspective of photographed documents and has automatic hole-punch removal capabilities.
OCR does not recognize cursive handwriting because "Optical Character Recognition" is defined only for printed text. Handwritten text can only be recognized if all characters are written separately ("manually printed text"). This recognition scenario is called ICR and is most commonly used for Zone Recognition (OCR, ICR) and Form Processing.
We recommend scanning documents at a resolution of 300 dpi. For fonts of 8-10 points, a resolution of 300 dpi is recommended, and for fonts smaller than 8 points, we recommend a resolution of 400-600 dpi.
A lower resolution leads to a degradation of quality and speed. For the best quality and speed, we recommend the font size to be between 12 and 20 points.
In general, characters written on gray or colored backgrounds can lead to recognition errors because these backgrounds make character reading difficult. Due to our state-of-the-art technology, colors are interpreted separately and can be eliminated during the recognition process if they overlap characters.
Therefore, our recommendation is to scan in color if documents have colored areas. However, even if the documents are only black and white, we still recommend scanning in color to maintain a smooth workflow (the speed difference between color scanning and black and white scanning is minimal).
The name IRIS stands for Image Recognition Integrated Systems. IRIS develops software and products that help people increase their productivity when scanning and converting documents.
IRIS develops technologies and products for intelligent document recognition and markets its portfolio globally through strong partnerships. The partner network is one of the three pillars of the IRIS Products & Technologies Division, along with OEM partners and proprietary solutions.
"Less paper, more content" is IRIS's motto, as it can help reduce paper usage through scanning, editing, and sharing digital files. IRIS is among the pioneers in text recognition and today is a leader in solutions that bring real value to investments.
IRIS manages information extraction from both sources (paper and files), allowing users to harness their content at the lowest cost and with the highest return on investment.