Questions tagged [tesseract]

Tesseract is an OCR (Optical Character Recognition) engine originally developed at HP Labs and now available as an open source library with development sponsored by Google.

Tesseract is an open source, multi-lingual OCR (Optical Character Recognition) engine originally developed at HP Labs. It is now sponsored by Google and licensed under the Apache license 2.0. It currently recognizes 107 languages. Tesseract is primarily written in C++ and C. The project is hosted at https://github.com/tesseract-ocr/tesseract and its support forums are found at http://groups.google.com/group/tesseract-ocr.

4350 questions

votes

2 answers

Can I use OCR to detect font style (bold, italic)?

I am interested in using OCR to extract bold and italic words from a simple text. For example, if I input a clear image with text like so: "The quick brown fox jumps over the lazy dog." I would like to get an output like so: bold("brown", "jumps"),…

ocr font-face tesseract

asked Mar 02 '11 at 04:17

vamin

2,178
6
26
30

votes

4 answers

pytesseract using tesseract 4.0 numbers only not working

Any one tried to get numbers only calling the latest version of tesseract 4.0 in python? The below worked in 3.05 but still returns characters in 4.0, I tried removing all config files but the digits file and still didn't work; any help would be…

python tesseract

asked Oct 04 '17 at 21:03

CuriousGeorge

votes

1 answer

Pytesseract set character whitelist

Does anyone know how to set the character whitelist for Pytesseract? I want it to only output A-z and 0-9. Is this possible? I have the following: img = Image.open('test.jpg') result = pytesseract.image_to_string(img, config='-psm 6') I'm getting…

python ocr tesseract python-tesseract

asked Apr 30 '17 at 10:35

Minato10

votes

1 answer

How to extract text from a directory of PDF files efficiently with OCR?

I have a large directory with PDF files (images), how can I extract efficiently the text from all the files inside the directory?. So far I tried to: import multiprocessing import textract def extract_txt(file_path): text =…

python python-3.x parallel-processing tesseract apache-tika

asked Apr 28 '17 at 05:09

john doe

2,233
7
37
58

votes

1 answer

Tesseract user-patterns

Any one know how to use the user patterns (user_patterns_suffix) in Tesseract? Could you advise me how to do with it and how to test it working? I tried to follow Tesseract guide (Tesseract user-patterns but I didn't see it affected the result at…

tesseract

asked Jun 20 '13 at 09:20

kha nguyen

votes

2 answers

Suggestions for digit recognition

I'm writing an Android app to extract a Sudoku puzzle from a picture. For each cell in the 9x9 Sudoku grid, I need to determine whether it contains one of the digits 1 through 9 or is blank. I start off with a Sudoku like this: I pre-process the…

android image-processing opencv ocr tesseract

asked Nov 10 '12 at 05:59

1''

26,823
32
143
200

votes

2 answers

Alternative to Tesseract OCR Training?

For the past 3 months I've been trying to train the Tesseract With identifying a collection of images I've had, due a real lack of proper documentation, and very high level of complexity I'm starting to give up on Tesseract as a solution. I'm…

ocr tesseract

asked Apr 01 '11 at 06:06

Asaf

8,106
19
66
116

votes

7 answers

Tesseract OCR Library - Learning Font

Well I'm using a complied .NET version of this OCR which can be found @ http://www.pixel-technology.com/freeware/tessnet2/ I have it working, however the aim of this is to translate license plates, sadly the engine really doesn't accurately…

c# image-processing ocr tesseract

asked Feb 05 '11 at 18:58

Ash

3,494
12
35
42

votes

2 answers

Doing OCR with R

I have been trying to do OCR within R (reading PDF data which data as scanned image). Have been reading about this @ http://electricarchaeology.ca/2014/07/15/doing-ocr-within-r/ This a very good post. Effectively 3 steps: convert pdf to ppm (an…

r shell pdf ocr tesseract

asked Aug 13 '15 at 05:04

anshuk_pal

votes

1 answer

Improve Tesseract OCR results with blurred text

I am working on OCR recognition of printed text. In particular I am focusing on the preprocessing step to improve the results of the Tesseract engine. I have already obtained good results with adaptive thresholding, noise removal, text deskew,…

image-processing ocr tesseract motion-blur

asked Dec 27 '14 at 21:56

Marco Ancona

2,073
3
22
37

votes

2 answers

iOS Tesseract OCR Image Preperation

I would like to implement an OCR application that would recognize text from Photos. I succeeded in Compiling and Integration the Tesseract Engine in iOS, I succeeded in getting reasonable detection when photographing clear documents (or a photoshot…

ios image-processing ocr tesseract

asked Nov 22 '12 at 10:50

alandalusi

1,145
4
18
39

votes

1 answer

Image processing for OCR with leptonica (inverse color text)

I am trying to process the following image with leptonica to extract text with tesseract. Original Image: Tesseract on the original image yields this: i s l D2J1FiiE-l191x1iitmwii9 uhiaiislz-2 Q ~37 Bottom linez With a little time! you can learn…

image-processing ocr tesseract

asked Jul 26 '12 at 21:48

jasonlfunk

5,159
4
29
39

votes

1 answer

Difference between Tesseract 3 and Tesseract 4?

What are the major differences between Tesseract 3 and Tesseract 4 ? And why should I choose one over the other ?

ocr tesseract

asked Jan 29 '18 at 09:55

F.Lin

votes

1 answer

Where is the default tesseract installation folder on a mac?

I've just installed tesseract through homebrew, I need to put some files inside the tessdata folder but I can't find it anywhere on my mac. I searched for "tesseract" in the finder and the search returned nothing, I couldn't find anything on google…

macos tesseract

asked Oct 02 '16 at 00:44

Barbara

12,908
6
32
43

votes

1 answer

Why am I getting "tiff page 1 not found" Lebtonica warning in Tesseract?

I just started using Tesseract. I am following the instructions described here. I have created a test image like this: training/text2image --text=test.txt --outputbase=eng.Arial.exp0 --font='Arial' --fonts_dir=/usr/share/fonts Now I want to train…

ocr tesseract tiff

asked Oct 22 '15 at 10:56

Mikayel Egibyan

Prev 1 2 3

…

99 100 Next