Questions tagged [tesseract]

Tesseract is an OCR (Optical Character Recognition) engine originally developed at HP Labs and now available as an open source library with development sponsored by Google.

Tesseract is an open source, multi-lingual OCR (Optical Character Recognition) engine originally developed at HP Labs. It is now sponsored by Google and licensed under the Apache license 2.0. It currently recognizes 107 languages. Tesseract is primarily written in C++ and C. The project is hosted at https://github.com/tesseract-ocr/tesseract and its support forums are found at http://groups.google.com/group/tesseract-ocr.

4350 questions
12
votes
2 answers

Does Tessaract OCR uses neural networks as their default training mechanism

Sorry this must be probably a dumb question. but i am fairly new to machine learning and Tessaract OCR. I have heard that Tessaract OCR can be trained. What i need to know is does Tessaract OCR uses neural networks as their default training…
HarshaXsoad
  • 776
  • 9
  • 30
12
votes
2 answers

Android: How to improve the numbers within the image retrieved by tesseract ocr?

I made a simple app that reads images and retrieves the number image as text with android. But the problem is that the accuracy is only about 60% and some unwanted noise also shows as well. I do perceive that the accuracy cannot be good as…
Jennifer
  • 1,822
  • 2
  • 20
  • 45
12
votes
6 answers

Installing Tesseract-OCR on CentOS 6

I'm trying to install Tesseract-OCR on my server however when I install all what I believe to be the correct repos. When I try to install it the package is not found I tried adding rpmforge but to no avail. Any ideas from somebody that has done…
William
  • 191
  • 1
  • 1
  • 11
12
votes
4 answers

Tess4j doesn't use it's tessdata folder

I am using tess4j, the java wrapper of Tesseract. I also have the normal Tesseract installed. I am not exactly sure how tess4j is meant to work, but since it comes with a tessdata folder, I can assume that you would put the language data files…
Kiwi Bird
  • 145
  • 1
  • 2
  • 9
12
votes
2 answers

Tesseract: Specifying regions of text

I'm using tesseract-ocr-3.01 to scan many forms. The forms all follow a template, so I already know where the regions/rectangles of text are. Is there a way to pass those regions to tesseract when using the command-line tool?
sashoalm
  • 75,001
  • 122
  • 434
  • 781
11
votes
3 answers

How to separate title and headers from body text in image

I am using tesseract (through the python wrapper) in order to extract text from documents. These documents do not include any images or tables, simply text. Is there any option to distinguish the titles/headings from the text? Ideally I want to be…
Prikers
  • 858
  • 1
  • 9
  • 24
11
votes
3 answers

Tesseract - ERROR net.sourceforge.tess4j.Tesseract - null

Created a java application that uses Tesseract in order to convert a given image or pdf to a string format, when running it on my machine as a unit test using junit it runs great but when running the full system which is a restFul API run by tomcat…
Adi
  • 2,074
  • 22
  • 26
11
votes
1 answer

Tesseract OCR force pattern

I want to read a specific character sequence with Tesseract like this post : Tesseract OCR: is it possible to force a specific pattern? I have tried bazaar matching pattern in Tesseract with the pattern \d\d\d\A\A and OCR still recognize other words…
leoden
  • 301
  • 3
  • 10
11
votes
1 answer

Can I test tesseract ocr in windows command line?

I am new to tesseract OCR. I tried to convert an image to tif and run it to see what the output from tesseract using cmd in windows, but I couldn't. Can you help me? What will be command to use? Here is my sample image:
Akunar
  • 145
  • 1
  • 1
  • 9
11
votes
2 answers

How to recognize MICR codes in Android

I am trying to find the way to OCR the MICR codes from document. For that I used Tesseract library, Using this I got success in recognizing texts but when it comes to MICR it fails to recognize that. Here is the sample MICR image which I want to…
Juned
  • 6,290
  • 7
  • 45
  • 93
11
votes
5 answers

iOS Tesseract: bad results

I just started to get my hands dirty with the Tesseract library, but the results are really really bad. I followed the instructions in the Git repository ( https://github.com/gali8/Tesseract-OCR-iOS ). My ViewController uses the following method to…
Dennis
  • 992
  • 1
  • 10
  • 23
11
votes
8 answers

Tess4j unsatisfied link error on mac OS X

Hey i am trying to use tess4j for tesseract and having this issue for eclipse on mac osx . My tesseract is working fine from terminal but trying to run tess4j through tesseract throws me an error . java.lang.UnsatisfiedLinkError: Unable to load…
nestrocuation
  • 219
  • 1
  • 4
  • 9
11
votes
2 answers

get the exact position of text from image in tesseract

Using GetHOCRText(0) method in tesseract I'm able to retrieve the text in html and on presenting the html in webview i'm able get the text but the postion of text in image is different from the output. Any idea is highly helpful. …
srividya
  • 419
  • 6
  • 16
10
votes
2 answers

Strength of Dictionary in Tesseract 3

How do I increase/decrease the strength of the dictionary in tesseract 3 ? In the FAQ it says I need to change the value of "NON_WERD" and "GARBAGE_STRING" but they do not exist in Tesseract 3.
10
votes
4 answers

Training Tesseract 3 to recognize numbers from real images of gas meters

I'm trying to train tesseract to recognize numbers from real images of gas meters. The images that I use for training are made with a camera, for this reason there are many problems: poor images resolution, blurred images, poor lighting or low…
Alessandro
  • 101
  • 1
  • 4