Questions tagged [ocr]

Optical Character Recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. The following topics, although some being distinct fields of application, are also commonly referred to as OCR: Handwritten Text Recognition (HTR), Optical Word Recognition (OWR), Intelligent Character Recognition (ICR), Intelligent Word Recognition (IWR).

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website.

OCR @Wikipedia

Frequently-asked questions:

6124 questions
30
votes
3 answers

Tesseract training for a new font

I'm still new to Tesseract OCR and after using it in my script noticed it had a relatively big error rate for the images I was trying to extract text from. I came across Tesseract training, which supposedly would be able to decrease error rate for a…
user19235
  • 591
  • 1
  • 4
  • 7
30
votes
2 answers

How can I run tesseract with multiple languages one time?

I have to analyzed a image which containing both English and Japanese texts. When I run tesseract by default (-l eng), some Japanese characters lost. Otherwise, if I run tesseract with japanese (-l jpn) some English characters lost (e.g. Email). How…
pars
  • 409
  • 1
  • 5
  • 10
30
votes
8 answers

How to know if a PDF contains only images or has been OCR scanned for searching?

I have a bunch of PDF files that came from scanned documents. The files contain a mix of images and text. Some were scanned as images with no OCR, so each PDF page is one large image, even where the whole page is entirely text. Others were…
Bratch
  • 4,103
  • 5
  • 27
  • 32
29
votes
5 answers

How to install language in tesseract OCR

I have installed tesseract OCR and it has only 'eng' and 'osd' in the language list. I need german language. I tired following command brew install tesseract-ocr-deu but i am getting error. Error: No available formula with the name…
Lama Madan
  • 617
  • 1
  • 10
  • 22
29
votes
5 answers

Remove background noise from image to make text more clear for OCR

I've written an application that segments an image based on the text regions within it, and extracts those regions as I see fit. What I'm attempting to do is clean the image so OCR (Tesseract) gives an accurate result. I have the following image as…
Zy0n
  • 810
  • 2
  • 14
  • 33
29
votes
5 answers

iOS: Real Time OCR on top of live camera feed (similar to iTunes Redeem Gift Card)

Is there a way to accomplish something similar to what the iTunes and App Store Apps do when you redeem a Gift Card using the device camera, recognizing a short string of characters in real time on top of the live camera feed? I know that in iOS 7…
boliva
  • 5,604
  • 6
  • 37
  • 39
28
votes
4 answers

Understanding Freeman chain codes for OCR

Note that I'm really looking for an answer to my question. I am not looking for a link to some source code or to some academic paper: I've already used the source and I've already read papers and still haven't figured out the last part of this…
SyntaxT3rr0r
  • 27,745
  • 21
  • 87
  • 120
28
votes
5 answers

How can i use tesseract ocr(or any other free ocr) in small c++ project?

So what I heard after research is that the only solid free OCR options are either Tesseract or CuneiForm. Now, the Tesseract docs are plain horrible, all they give you is a bunch of Visual Studio code (for me on Windows) and from there you are on…
Marko29
  • 1,005
  • 4
  • 14
  • 25
28
votes
4 answers

Tesseract ocr PDF as input

I am building an OCR project and I am using a .Net wrapper for Tesseract. The samples that the wrapper have don't show how to deal with a PDF as input. Using a PDF as input how do I produce a searchable PDF using c#? I have use ghostscript library…
acrab
  • 319
  • 1
  • 3
  • 5
27
votes
1 answer

Getting text from image on ios (image processing)

I am thinking of making an application that requires extracting TEXT from an image. I haven't done any thing similar and I don't want to implement the whole stuff on my own. Is there any known library or open source code (supported for ios,…
Vikram.exe
  • 4,565
  • 3
  • 29
  • 40
27
votes
4 answers

My own OCR-program in Python

I am still a beginner but I want to write a character-recognition-program. This program isn't ready yet. And I edited a lot, therefor the comments may not match exactly. I will use the 8-connectivity for the connected component labeling. from PIL…
kame
  • 20,848
  • 33
  • 104
  • 159
27
votes
2 answers

Where can I find a free .Net (C#) library that I can use to scan and OCR documents?

I searching for a free .Net (C#) library that iIcan use to scan from a document scanner, and then OCR the document, so I can get the text from it to save in a database. After some search I can not find anyone working in Visual Studio 2010 and .Net…
RickardP
  • 2,558
  • 7
  • 34
  • 42
26
votes
8 answers

Can OCR software reliably read values from a table?

Would OCR Software be able to reliably translate an image such as the following into a list of values? UPDATE: In more detail the task is as follows: We have a client application, where the user can open a report. This report contains a table of…
GarethOwen
  • 6,075
  • 5
  • 39
  • 56
26
votes
1 answer

How do I train tesseract 4 with image data instead of a font file?

I'm trying to train Tesseract 4 with images instead of fonts. In the docs they are explaining only the approach with fonts, not with images. I know how it works, when I use a prior version of Tesseract but I didn't get how to use the box/tiff…
claim
  • 506
  • 6
  • 13
26
votes
2 answers

OpenCV MSER detect text areas - Python

I have an invoice image, and I want to detect the text on it. So I plan to use 2 steps: first is to identify the text areas, and then using OCR to recognize the text. I am using OpenCV 3.0 in python for that. I am able to identify the text(including…
Amit Madan
  • 1,013
  • 2
  • 12
  • 23