Tesseract (software)

Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License. Originally developed by Hewlett-Packard as proprietary software in the 1980s, it was released as open source in 2005 and development has been sponsored by Google since 2006.

Tesseract
Original author(s)Ray Smith, Hewlett-Packard
Developer(s)Google and others
Stable release
5.3.4  / 18 January 2024
Repository
Written inC and C++
Operating systemLinux, Windows, and macOS
Available inInterface: English
Recognition:

Afrikaans, Albanian, Arabic, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Catalan, Czech, Cherokee, Croatian, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, German, Greek, Hindi, Hebrew, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Macedonian, Maltese, Malay, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian & Vietnamese

(more can be added using included training files)
TypeOptical character recognition
LicenseApache License 2.0
Websitegithub.com/tesseract-ocr 

In 2006, Tesseract was considered one of the most accurate open-source OCR engines available.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.