iText launches iText pdfOCR, a powerful open-source product enabling text recognition in scanned documents and conversion into editable PDFs

June 30, 2020 - 03:29
iText launches iText pdfOCR, a powerful open-source product enabling text recognition in scanned documents and conversion into editable PDFs

LONDON, UK - MediaOutReach - 30 June 2020 - iText Group NV, a globally recognized thought-leader andinnovator in PDF libraries and solutions, today announced the launch of iTextpdfOCR, the newest addition to their award-winning software offering. 

 

iTextpdfOCR, which is part of the renowned iText 7 PDF SDK, offers OpticalCharacter Recognition (OCR) functionality to convert printed text in scanneddocuments and images into a fully searchable PDF/A-3u compliant format (PDFversion 1.7) and make accessing those texts easier and faster. Withoutmachine-readable text, printed or scanned documents cannot be searched, indexedor interpreted. Logical follow-up actions could be data extraction with iText pdf2Data, secure contentredaction with iText pdfSweep, or multilingualdocument recreation with iText pdfCalligraph. With repurposing datawith the low-code document generator iText DITO® often being the finalcherry on the cake.


The iTextpdfOCR add-on is built on the Tesseract OCR engine technology. Tesseractsupports over 100 languages and was originally developed by Hewlett-Packard('85), and was released under the Apache open source license in 2005. Since2006, its development has been sponsored by Google.

 

"With COVID-19urging companies to accelerate their digital transformation projects,organizations are forced to explore new ways of accessing and managing theirdata -- existing and new. Being a leader in the digital documents space, we'repleased to be at the forefront of this new era. As such, I am very proud toannounce the latest addition to our PDF library for today's new world: thanksto the OCR capabilities of iText pdfOCR many new opportunitieswill open up for users and enterprises that want to maximize their datapotential." Yeonsu Kim, CEO at iText Group NV stated.

 

"Staying trueto our open-source roots, we've decided to build iTextpdfOCR upon the open-source Tesseract OCR Engine. With this, we wishto reconfirm our positioning as an open-source company - a value which isappreciated by our millions of users and clients."

 

"With this newaddition to our PDF library, developers will now be able to leverage datalocked away in documents which until now weren't accessible. Our latest productenables them to enlarge their digital workflow capabilities by accessing thedata buried in scanned files and deploy it for any action or purpose they ortheir end-user would like." Tony Van den Zegel, VP of Products & Marketingat iText Group NV and General Manager at iText Software Belgium, said.

 

The applicationsof iTextpdfOCR are various: for instance, archiving of historical documents,translations of legal documents, automatic data entry while processing allsorts of physical applications or claims, and sorting of otherwise not editableprinted or scanned documents.

 

Please tune infor live demos on 9 July 2020. More information can be found here.


E-paper