OCR Technology

OCR Technology

OCR (Optical Character Recognition) is a software that enables a computer to read static images of text and convert them into editable data that can be searched. OCR involves the electronic identification and digital encoding of printed text using an optical scanner and specialized software.

It is a common method for digitizing printed text, making it electronically editable, searchable, more compact for storage, displayed online, and used in automated processes such as cognitive computing, automatic translation, text-to-speech conversion, key data extraction, or text extraction.

OCR is widely used for converting data from paper-based records (whether they are identity documents, invoices, bank statements, computerized receipts, business cards, mail, or any relevant documentation) into printed text data.

OCR involves 3 steps

scanning_albastru

Step 1

Opening and scanning a document in the OCR software

fast-forward_albastru
document_albastru

Step 2

Recognizing the document within the software

fast-forward_albastru
floppy-disk_albastru

Step 3

Saving the document produced by the software in a format of your choice

Basic Features, Specifications, and Recommendations for OCR

Image Module

You can upload and save images in various formats such as BMP, PNG, TIFF, PDF, and JPEG. You can also use JPEG2000 and JBIG2 compression (separate extension).

Preprocessing Capability

Clean the original images with features like adaptive binarization, despeckle filters, deskew function, or document rotation. With a separate extension, you can also remove dark edges, eliminate lines, and abandon color.

Recognizes 137+ Languages

Recognizes 137+ languages with various supplements, including Asian, Hebrew, Arabic, banking fonts, and ICR.

Recognizes Barcodes

Our barcode recognition module is capable of recognizing popular 1D barcodes such as Code 39, Code 128, EAN, and UPC. An additional extension enables the decoding of 2D barcodes for recognizing PDF417, QR codes, and data matrices.

Saving Documents in Multiple Formats

Output document formats in IRIS OCR include PDF, PDF/A, HTML, XML, RTF, TXT, ODT, WordML, SpreadsheetML, CSV, DOCX, XLSX, and XPS. An additional compression module generates compressed files using our iHQC technology in PDF and XPS.

Page Processing

OCR technology has the ability of zone recognition and automatically recognizes page orientation. It will automatically correct the perspective of photographed documents and has automatic hole-punch removal capabilities.

Handwriting

OCR does not recognize cursive handwriting because "Optical Character Recognition" is defined only for printed text. Handwritten text can only be recognized if all characters are written separately ("manually printed text"). This recognition scenario is called ICR and is most commonly used for Zone Recognition (OCR, ICR) and Form Processing.

Image Resolution

We recommend scanning documents at a resolution of 300 dpi. For fonts of 8-10 points, a resolution of 300 dpi is recommended, and for fonts smaller than 8 points, we recommend a resolution of 400-600 dpi.

A lower resolution leads to a degradation of quality and speed. For the best quality and speed, we recommend the font size to be between 12 and 20 points.

Scanning

In general, characters written on gray or colored backgrounds can lead to recognition errors because these backgrounds make character reading difficult. Due to our state-of-the-art technology, colors are interpreted separately and can be eliminated during the recognition process if they overlap characters.

Therefore, our recommendation is to scan in color if documents have colored areas. However, even if the documents are only black and white, we still recommend scanning in color to maintain a smooth workflow (the speed difference between color scanning and black and white scanning is minimal).

IRIS - OCR Technology Provider

The name IRIS stands for Image Recognition Integrated Systems. IRIS develops software and products that help people increase their productivity when scanning and converting documents.

IRIS develops technologies and products for intelligent document recognition and markets its portfolio globally through strong partnerships. The partner network is one of the three pillars of the IRIS Products & Technologies Division, along with OEM partners and proprietary solutions.

"Less paper, more content" is IRIS's motto, as it can help reduce paper usage through scanning, editing, and sharing digital files. IRIS is among the pioneers in text recognition and today is a leader in solutions that bring real value to investments.

IRIS manages information extraction from both sources (paper and files), allowing users to harness their content at the lowest cost and with the highest return on investment.