Questions tagged [ocr]

Optical Character Recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. The following topics, although some being distinct fields of application, are also commonly referred to as OCR: Handwritten Text Recognition (HTR), Optical Word Recognition (OWR), Intelligent Character Recognition (ICR), Intelligent Word Recognition (IWR).

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website.

OCR @Wikipedia

Frequently-asked questions:

Simple Digit Recognition OCR in OpenCV-Python

6124 questions

votes

6 answers

How do I segment a document using Tesseract then output the resulting bounding boxes and labels

I'm trying to get Tesseract to output a file with labelled bounding boxes that result from page segmentation (pre OCR). I know it must be capable of doing this 'out of the box' because of the results shown at the ICDAR competitions where contestants…

ocr tesseract hocr

asked Feb 18 '15 at 18:27

James Owers

7,948
10
55
71

votes

1 answer

Using Microsoft OCR Library with JS/jQuery in VS 2013

I am currently working on a windows 8.1 application and I am using web languages and mostly jQuery (Cordova type project) as it might be used on other platforms. I need to use the Microsoft OCR Library (not Tesseract or any other ones, I know them…

javascript cordova visual-studio-2013 ocr visual-studio-cordova

asked Apr 15 '15 at 13:33

ColonelMoumou

votes

4 answers

Character recognition (OCR algorithm)

I am working on a project in which I have to develop OCR Algorithm ( I have to read the text from Image and then convert it to different language ).So my first task is to get text from image. Steps to complete first task. Loading any image format…

ocr

asked Mar 03 '13 at 16:58

TLE

votes

9 answers

What is the ideal font for OCR?

Does anybody have any experience with different fonts for OCR? I am generating an ID then trying to scan it with tesseract. At the moment I am just T&E'n different fonts, but this seems pretty inefficient. I've tried the OCR* family of fonts, and…

fonts ocr tesseract

asked Nov 25 '08 at 01:06

Chris Lloyd

12,100
7
36
32

votes

6 answers

Preprocessing image for Tesseract OCR with OpenCV

I'm trying to develop an App that uses Tesseract to recognize text from documents taken by a phone's cam. I'm using OpenCV to preprocess the image for better recognition, applying a Gaussian blur and a Threshold method for binarization, but the…

opencv image-processing ocr tesseract

asked Mar 09 '15 at 05:57

Mauricio

votes

6 answers

Recognize a number from an image

I'm trying to write an application to find the numbers inside an image and add them up. How can I identify the written number in an image? There are many boxes in the image I need to get the numbers in the left side and sum them to give total. How…

java image-processing ocr tesseract hough-transform

asked Apr 20 '15 at 10:45

Hash

7,726
9
34
53

votes

3 answers

Is there an efficient algorithm for segmentation of handwritten text?

I want to automatically divide an image of ancient handwritten text by lines (and by words in future). The first obvious part is preprocessing the image... I'm just using a simple digitization (based on brightness of pixel). After that I store data…

c# algorithm image-processing ocr genetic-algorithm

asked Nov 04 '11 at 19:55

Ernado

votes

8 answers

Is there an OCR library that outputs coordinates of words found within an image?

In my experience, OCR libraries tend to merely output the text found within an image but not where the text was found. Is there an OCR library that outputs both the words found within an image as well as the coordinates (x, y, width, height) where…

ocr

asked Feb 18 '11 at 12:01

Adam Paynter

46,244
33
149
164

votes

9 answers

Tesseract OCR simple example

Hi Can you anyone give me a simple example of testing Tesseract OCR preferably in C#. I tried the demo found here. I download the English dataset and unzipped in C drive. and modified the code as followings: string path =…

c# ocr tesseract

asked May 16 '13 at 22:14

Will Robinson

votes

6 answers

Using Tesseract from java

I'm trying to build a sample application in java that will read an image file and just output the text extracted from the image. I found the Tesseract project which seems promising, however, its in c++. In order to use it, should I simply run it as…

java ocr tesseract

asked Dec 20 '12 at 14:45

Omnipresent

29,434
47
142
186

votes

7 answers

How to remove all lines and borders in an image while keeping text programmatically?

I'm trying to extract text from an image using Tesseract OCR. Currently, with this original input image, the output has very poor quality (about 50%). But when I try to remove all lines and borders using photoshop, the output improves a lot (~90%).…

image opencv image-processing computer-vision ocr

asked Nov 27 '15 at 03:26

wind

votes

5 answers

OCR with the Tesseract interface

How do you OCR an tiff file using Tesseract's interface in c#? Currently I only know how to do it using the executable.

c# ocr tesseract

asked Aug 27 '08 at 14:46

toh yen cheng

votes

2 answers

Which OCR Engine is better: Tesseract or OCRopus?

I have tried Tesseract with iPhone and assessed its accuracy to be 70% without image preprocessing. I also noticed that it might be poor in extracting digits. I have heard about OCRopus OCR engine: which is better, Tesseract or OCRopus, in terms of…

ocr tesseract feature-extraction

asked Apr 05 '12 at 17:08

Ahmed Hussein

votes

10 answers

Programmatically recognize text from scans in a PDF File

I have a PDF file, which contains data that we need to import into a database. The files seem to be pdf scans of printed alphanumeric text. Looks like 10 pt. Times New Roman. Are there any tools or components that can will allow me to recognize…

pdf ocr

asked Oct 01 '08 at 16:23

Rob

3,026
4
30
32

votes

2 answers

What OCR options exist beyond Tesseract?

I've used Tesseract a bit and it's results leave much to be desired. I'm currently detecting very small images (35x15, without border, but have tried adding one with imagemagick with no ocr advantage); they range from 2 chars to 5 and are a pretty…

php python ruby ocr tesseract

asked Mar 13 '12 at 19:31

ylluminate

12,102
17
78
152

Prev 1 2

…

99 100 Next