Questions tagged [ocr]

Optical Character Recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. The following topics, although some being distinct fields of application, are also commonly referred to as OCR: Handwritten Text Recognition (HTR), Optical Word Recognition (OWR), Intelligent Character Recognition (ICR), Intelligent Word Recognition (IWR).

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website.

OCR @Wikipedia

Frequently-asked questions:

Simple Digit Recognition OCR in OpenCV-Python

6124 questions

votes

1 answer

Text Extraction from pdf image file

I have an image file, and I want to extract text from a given image, I tried various OCR engine but I am unable to find the relationship between left side entity and right side entity because OCR engine simply extracts text without the relationship…

asked Aug 27 '19 at 14:05

PRAYANK

votes

0 answers

How to extract dates from a Certificate of Disability with Tesseract?

I want to develop a tool, that extracts the sickness dates from a certificate of disability. In Germany these certificates are standardized forms ("Arbeitsunfähigkeitsbescheinigung"), that contain dates like this: I tried using Tess4j and extracted…

date ocr tesseract noise

asked Aug 19 '19 at 13:14

jenald

votes

2 answers

How to extract numbers from image using OpenCV and pytesseract image_to_string()?

I'm trying to extract the numbers from an image using OpenCV and the image_to_string() method from pytesseract, but the output is not good. I tried some pre-processing methods like resize and noise filters, but still can't get accurate results. How…

python opencv image-processing ocr python-tesseract

asked Aug 16 '19 at 19:29

Joseph

votes

1 answer

Define multiple columns in tesseract OCR parameters?

I'm using OCR on historical newspapers that contain 6 columns per page. At present I use FineReader and define text blocks for each column. I'd like to use Tesseract. Tesseract gets the columns mostly right, but every few lines it reads into…

ocr tesseract

asked Aug 13 '19 at 19:55

Will Hanley

votes

0 answers

How can I identify numbers from an image?

I'm writing a script that takes an image and crops the image down to only include the number I want it to recognize. I have that part working fine. The numbers will be either single or double digit. I've tried using Googles Vision API, which works…

python ocr image-recognition number-recognition

asked Aug 06 '19 at 21:44

jimmyshadow1

votes

1 answer

How can I extract data from a handwritten, scanned PDF using Python?

So I have these PDFs that are scanned copies of a structured feedback form. The form has these checkboxes and spaces for hand written notes. I am trying to extract the data from these PDFs and save it to an unstructured CSV file. Now using …

python ocr python-tesseract handwriting-recognition

asked Aug 04 '19 at 11:51

PranavM

votes

1 answer

Detecting signed dots on piece of paper with c# / php / anylanguage

A worker has a printed piece of paper (standardized from template) with options (say a checkbox). He checks stuff, sign dots - this ok, this is not, do this, do that based on that report. I want to create a program (probably with c#) that will read…

c# php image ocr

asked Jul 27 '19 at 10:56

baron_bartek

1,073
2
20
39

votes

1 answer

Improving pytesseract correct text recognition from image

I am trying to read captcha using pytesseract module. And it is giving accurate text most of the time, but not all the time. This is code to read the image, manipulate the image and extract text from the image. import cv2 import numpy as np import…

python opencv image-processing ocr python-tesseract

asked Jul 25 '19 at 21:27

Tony Montana

votes

0 answers

Easy way to extract/validate data from OCR JSON result based on rules/selectors

My goal is to extract information from several different types of Invoices and transform that input into standard output. For now, all the Invoices are in PDF format (original digital pdfs, not printed!), so I don't think I need OCR but maybe in the…

ocr google-cloud-vision azure-cognitive-services pdftotext amazon-textract

asked Jul 22 '19 at 10:55

João Antunes

votes

1 answer

Python : Geting issue on OCR while using python tesseract API interface

I used Pytesseract module for OCR. It seems slow process. So I followed Pytesseract is too slow. How can I make it process images faster? . I used code mentioned in…

linux python-3.x ocr opencv python

asked Jul 15 '19 at 09:43

Rajesh das

votes

0 answers

Improve accuracy of tesseract ocr in android through preprocessing

The goal is to make on ocr app using tesseract, I didn't wanted to use tess-two as it works on older version of tesseract. So after a little research i was able to find this library which uses tesseract 4 and is a fork of tess-two. I am able to…

java android image-processing ocr tesseract

asked Jul 09 '19 at 09:12

Akanksha Singh

votes

1 answer

Why I get 0s as output when I tried to calculate accuracy for image segmented result?

I checked the accuracy of a segmentation method using the bboxPrecisionRecall function in Matlab version '9.4.0.857798 (R2018a) Update 2' and test result of an algorithm using IESK-ArDB dataset. The database is freely available here. Samples of…

matlab image-processing ocr image-segmentation

asked Jul 08 '19 at 13:24

N.white

votes

2 answers

How To Scan a Seven Segment Display by Firebase ML kit Text Recognition?

The Text Recognition API in Firebase Ml kit is not recognizing the digital numbers or a seven segment display numbers that i am trying to scan out from a weight scale , is there anyway to work it out ? I tried the Dart package for firebase ml…

flutter dart ocr firebase-mlkit text-recognition

asked Jul 04 '19 at 17:54

Peter aziz

votes

2 answers

Improving Tesseract OCR accuracy on screenshot

The tesseract OCR on screenshots gives rather erratic results. Only some of the text seems to be recognized correctly even though the image is completely black with white text over it. Even after I resize the image to 300dpi the accuracy remains low…

image-processing ocr tesseract training-data

asked Jun 19 '19 at 11:55

forever

votes

4 answers

How to detect and rotate images in python

I have multiple pdf invoice which i am trying to parse. I convert them to images and use ocr to get text from the images. One of the pdf has 2 out of 3 pages which are rotated by 90 degrees. How do i detect these rotated pages and correctly rotate…

python pdf ocr python-tesseract image-preprocessing

asked Jun 19 '19 at 09:15

Developer

Prev 1 2 3

…

100 Next