Questions tagged [tesseract]

Tesseract is an OCR (Optical Character Recognition) engine originally developed at HP Labs and now available as an open source library with development sponsored by Google.

Tesseract is an open source, multi-lingual OCR (Optical Character Recognition) engine originally developed at HP Labs. It is now sponsored by Google and licensed under the Apache license 2.0. It currently recognizes 107 languages. Tesseract is primarily written in C++ and C. The project is hosted at https://github.com/tesseract-ocr/tesseract and its support forums are found at http://groups.google.com/group/tesseract-ocr.

4350 questions
21
votes
4 answers

How to extract text from image Android app

I am working on a feature for my Android app. I would like to read text from a picture then save that text in a database. Is using OCR the best way? Is there another way? Google suggests in its documentation that NDK should only be used if strictly…
MrAnderson1992
  • 399
  • 1
  • 3
  • 11
21
votes
1 answer

Tesseract OCR user patterns

Is there any way to get Tesseract to match only user-specified words or patterns? The manual claims it is possible, yet I cannot find a single documented instance on the internet of somebody getting this working. Here are many examples of people…
Michael Connor
  • 473
  • 1
  • 4
  • 9
21
votes
3 answers

Text detection on Seven Segment Display via Tesseract OCR

The problem that I am running with is to extract the text out of an image and for this I have used Tesseract v3.02. The sample images from which I have to extract text are related to meter readings. Some of them are with solid sheet background and…
yunas
  • 4,143
  • 1
  • 32
  • 38
21
votes
1 answer

Getting error: "bad read of inttemp!" when training a new font in Tesseract 2

I'm trying to train Tesseract for a new font which can be used in my Android app. I need to train for digits only, so I had created one training image, box file and unicharset file. I have followed the training instructions, but when I tried to run…
Dipin
  • 1,085
  • 6
  • 19
21
votes
4 answers

Tesseract 3 (OCR) - .NET Wrapper

http://code.google.com/p/tesseractdotnet/ I am having a problem getting Tesseract to work in my Visual Studio 2010 projects. I have tried console and winforms and both have the same outcome. I have come across a dll by someone else who claims to…
Jpin
  • 1,527
  • 5
  • 18
  • 27
20
votes
4 answers

How can I use Tesseract in Android?

I have searched on the net for a couple of hours. I got many answers saying we need to use NDK, etc. for "Tesseract" for WINDOWS. But I didn't get any step-by-step/proper explanation of what should be done when NDK is installed. How to get the .so…
PrincessLeiha
  • 3,144
  • 4
  • 32
  • 53
20
votes
2 answers

Tesseract OCR fails to detect varying font size and letters that are not horizontally aligned

I am trying to detect these price labels text which is always clearly preprocessed. Although it can easily read the text written above it, it fails to detect price values. I am using python bindings pytesseract although it also fails to read from…
NONONONONO
  • 612
  • 1
  • 6
  • 10
19
votes
2 answers

Tesseract confuses two numbers

I'm writing an application to scan numbers from an image. The numbers are using the OCR-B font and may also contain + and > characters. This is my source image: The scans using Tesseract weren't very good, even when limiting the character set to…
Danilo Bargen
  • 18,626
  • 15
  • 91
  • 127
19
votes
1 answer

Tesseract handwriting with dictionary training

I have a dictionary of words in a text file, separated by newlines. And I want to recognize the handwriting using Tesseract, and output the nearest matching line in the text file. This is the first time I'll be using Tesseract, and it's already in…
Ruel
  • 15,438
  • 7
  • 38
  • 49
18
votes
4 answers

What's the best way to ocr as much text as possible from video game screenshots?

I'm trying to use the tesseract ocr tool to extract ocr text from video games(I'm pre processing screenshots and passing them to command line tool tsv output and parsing that). I'd like to use it for test automation not unlike selenium web testing.…
Roman A. Taycher
  • 18,619
  • 19
  • 86
  • 141
18
votes
1 answer

pytesseract cannot find the file specified

My code is straight forward and is the following: import pytesseract from PIL import Image img = Image.open('C:/temp/foo.jpg') img.load() i = pytesseract.image_to_string(img) and the error response I get back is: Traceback (most recent call…
jason m
  • 6,519
  • 20
  • 69
  • 122
18
votes
1 answer

chinese character recognition using Tesseract OCR

I have been using Tesseract 3.0.2 OCR SDK for image text extraction. But if I use Chinese text images and pass through OCR then Tesseract doesn't provide me the Chinese characters instead of that I am getting numeric and english characters. But I…
Nishant Tyagi
  • 9,893
  • 3
  • 40
  • 61
18
votes
3 answers

what's the best image input type for tesseract?

I'm using tesseract on a project and want to know the best image input type for tesseract to give the best output. Is Binary&TIFF the best input or there's something else?
chostDevil
  • 1,041
  • 5
  • 17
  • 24
17
votes
4 answers

converting cv::Mat for tesseract

I'm using OpenCV to extract a subimage of a scanned document and would like to use tesseract to perform OCR over this subimage. I found out that I can use two methods for text recognition in tesseract, but so far I wasn't able to find a working…
Pedro
  • 4,100
  • 10
  • 58
  • 96
17
votes
12 answers

(-215:Assertion failed) !_src.empty() in function 'cv::cvtColor' with cv::imread

I am trying to recognize text from an image to then have the text outputted; however, this error spits out: Traceback (most recent call last): File "C:/Users/Benji's Beast/AppData/Local/Programs/Python/Python37-32/imageDet.py", line 41, in…
Benji
  • 197
  • 1
  • 1
  • 5