Questions tagged [tesseract]

Tesseract is an OCR (Optical Character Recognition) engine originally developed at HP Labs and now available as an open source library with development sponsored by Google.

Tesseract is an open source, multi-lingual OCR (Optical Character Recognition) engine originally developed at HP Labs. It is now sponsored by Google and licensed under the Apache license 2.0. It currently recognizes 107 languages. Tesseract is primarily written in C++ and C. The project is hosted at https://github.com/tesseract-ocr/tesseract and its support forums are found at http://groups.google.com/group/tesseract-ocr.

4350 questions

votes

4 answers

Tesseract ocr PDF as input

I am building an OCR project and I am using a .Net wrapper for Tesseract. The samples that the wrapper have don't show how to deal with a PDF as input. Using a PDF as input how do I produce a searchable PDF using c#? I have use ghostscript library…

c# ocr tesseract

asked Apr 15 '15 at 17:48

acrab

votes

3 answers

How to find parameters supported in Tesseract OCR config file

I want to know what parameters the config file used by Tesseract OCR accepts, how to write a config file, etc. I can't find any documentation about this on their site. How can I determine what parameters are supported, and what they mean?

tesseract

asked Oct 22 '12 at 08:05

sashoalm

75,001
122
434
781

votes

1 answer

How do I train tesseract 4 with image data instead of a font file?

I'm trying to train Tesseract 4 with images instead of fonts. In the docs they are explaining only the approach with fonts, not with images. I know how it works, when I use a prior version of Tesseract but I didn't get how to use the box/tiff…

ocr tesseract lstm training-data

asked Apr 11 '17 at 17:47

claim

votes

3 answers

How can I use async to increase WinForms performance?

i was doing some processor heavy task and every time i start executing that command my winform freezes than i cant even move it around until the task is completed. i used the same procedure from microsoft but nothing seem to be changed. my working…

c# asynchronous tesseract

asked Feb 19 '13 at 16:51

Serak Shiferaw

votes

5 answers

Best way to recognize characters in screenshot?

What would you recommend for recognizing all characters from a screenshot? The screenshot is perfectly clear (only black text on a white background), also I can choose any standard font for the text (installed on Windows). I have tried some OCR ways…

fonts ocr tesseract pattern-recognition

asked Nov 17 '10 at 21:20

Tomek

votes

2 answers

Tesseract traineddata not working in Swift 3.0 project using version 4.0

I'm attempting to use Tesseract-OCR-iOS in a new Swift 3.0 project. I'm using Xcode Version 8.1 (8B62). CocoaPods is version 1.1.1. When I attempt to use tesseract.recognize(), my app crashes and I get the following output in the…

ios swift ocr tesseract

asked Dec 13 '16 at 21:45

Adrian

16,233
18
112
180

votes

5 answers

How to preserve document structure in tesseract

I am using tesseract ocr to extract text from an image. Preserving the structure of the document is very important to me. Currently tesseract does not preserve the structure, infact it changes the order of text. My input is the image below. and the…

ocr tesseract

asked Mar 24 '14 at 12:44

Sar009

2,166
5
29
48

votes

4 answers

Tesseract Trained data

Am trying to extract data from reciepts and bills using Tessaract , am using tesseract 3.02 version . am using only english data , Still the output accuracy is about 60%. Is there any trained data available which i just replace in tessdata folder

tesseract

asked Aug 26 '12 at 17:14

nicky

3,810
9
35
44

votes

5 answers

Why Tesseract OCR library (iOS) cannot recognize text at all?

I'm trying to use Tesseract OCR library in my iOS application. I downloaded tesseract-ios library from github and when I tried to recognize a simple text image I got garbage instead. Here is an image of what I tried to recognize: I got unreadable…

ios objective-c ocr tesseract

asked Jun 18 '13 at 12:42

MainstreamDeveloper00

8,436
15
56
102

votes

4 answers

Tesseract does not recognize single characters

How to represent: Create new image with paint (any size) Add letter A to this image Try to recognize -> tesseract will not find any letters Copy-paste this letter 5-6 times to this image Try to recognize -> tesseract will find all the letters Why?

ocr tesseract

asked Mar 09 '12 at 09:55

artem

16,382
34
113
189

votes

5 answers

Tesseract OCR on AWS Lambda via virtualenv

I have spent all week attempting this, so this is a bit of a hail mary. I am attempting to package up Tesseract OCR into AWS Lambda running on Python (I am also using PILLOW for image pre-processing, hence the choice of Python). I understand how to…

python amazon-web-services virtualenv tesseract aws-lambda

asked Nov 07 '15 at 21:57

Andy G

votes

2 answers

Tesseract receipt scanning advice needed

I have struggled off and on again with Tesseract for various OCR projects and I found a use case today which I thought would be a slam dunk for it but after many hours I am still coming away unsatisfied. I wanted to pose the problem here and see…

ocr tesseract receipt

asked Jul 26 '15 at 03:39

Jim Sanders

votes

2 answers

how to avoid Permission denied while installing package for Python without sudo

I am trying to install the tesseract wrapper for python as user mike so that I can import tesseract. I'm following the guide here https://code.google.com/p/python-tesseract/wiki/HowToCompilePythonTesseractForCentos However, when I execute python…

python centos tesseract python-tesseract

asked Mar 21 '14 at 06:07

Anthony

33,838
42
169
278

votes

2 answers

Improving OCR performance on multi-paragraph scans

I'm working on a project that involves extracting text scientific papers stored in PDF format. For most papers, this is accomplished quite easily using PDFMiner, but some older papers store their text as large images. In essence, a paper is…

python ocr tesseract

asked Jul 25 '12 at 17:50

Louis Thibault

20,240
25
83
152

votes

2 answers

Tesseract OCR confuses slashed 0 as 8

I have trained tesseract on the terminus font, but no matter what, I can't get it to recognize the 0s. I am using the jTessEditor to create the training tif and boxes. Even when validating, it reads all 0s as 8s. Is there anything I am missing? Here…

python ocr tesseract

asked Oct 31 '18 at 19:10

Vilsol

Prev 1 2

…

99 100 Next