Questions tagged [ocr]

Optical Character Recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. The following topics, although some being distinct fields of application, are also commonly referred to as OCR: Handwritten Text Recognition (HTR), Optical Word Recognition (OWR), Intelligent Character Recognition (ICR), Intelligent Word Recognition (IWR).

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website.

OCR @Wikipedia

Frequently-asked questions:

6124 questions
2
votes
1 answer

Best OCR approach on documents with different formats to find one specific information

Unfortunately, because of confidential data, I can't give a more specific explanation. The Problem So I've got a few documents that in general contain the same information but have different formats. In most cases, the value I am looking for is near…
2
votes
1 answer

Sending Batch request to azure cognitive API for TEXT-OCR

I am calling the Azure cognitive API for OCR text-recognization and I am passing 10-images at the same time simultaneously (as the code below only accepts one image at a time-- that is 10-independent requests in parallel) which is not efficient to…
The Gr8 Adakron
  • 1,200
  • 1
  • 12
  • 15
2
votes
0 answers

OCR JPEG file to text

I am trying to convert the attached OCR JPEG file to text. When I use pytesseract or tesseract, I am seeing diacritics because of which my output contains a lot of junk characters. Also, conversion of jpeg to text is not working. I tried to read…
2
votes
0 answers

extract text from image using python pytesseract

This is the first time I am working with OCR. I have an image and want to extract data from the image. My image looks like this: I have 500 such images and will have to record the parameters and the respective values. I'm thinking of doing it…
chink
  • 1,505
  • 3
  • 28
  • 70
2
votes
2 answers

How can I add a new font to Tesseract 4.0?

I'm making a text identification program and I want to train my Tesseract 4.0 to identify a specific font (in Hebrew). How can I do it? I tried "trainyourtesseract.com" (that did'nt work at all) and "jTessBoxEditor" (that I didn't understand how to…
yuval
  • 21
  • 1
  • 3
2
votes
1 answer

AttributeError: module 'pytesseract' has no attribute 'run_tesseract'

I am trying to use the run_tesseract function to get an hocr output for extracting text from an image for Bank receipt images.However I am getting the above error message. I have installed Tesseract-OCR on my laptop, and have also added its path to…
2
votes
1 answer

I need OCR for WPF

I need OCR component for InkCanvas control in WPF so I can recognize characters and replaced hand writing one with good one from OCR ?
kartal
  • 17,436
  • 34
  • 100
  • 145
2
votes
1 answer

Tesseract OCR on binary image

I have a binary image like this, I want to extract the numbers in the image using tesseract ocr in Python. I used pytesseract like this on the image, txt = pytesseract.image_to_string(img) But I am not getting any good results. What can I do in…
Sreeram TP
  • 11,346
  • 7
  • 54
  • 108
2
votes
0 answers

How i remove geometric forms same color to text from captcha?

I'm trying to remove a thresh lines and forms from a specific captcha. I removed a lot, but I have some problem now... In this captcha, I have specific geometric forms same color as text, like an example above. I've made a code who remove a lot of…
2
votes
2 answers

How to extract circular text from embossed object

I have an object, there are 2 code on it. text printed on it. The text is curve. half of text is in the top side, and another half is in bottom side of object. Here is my sample image I am using OPENCV, and Deep learning approaches and tessract to…
s.john
  • 21
  • 4
2
votes
0 answers

How to visualize the text layer in a pdf

I'm looking for a way to extract text and the position of that text from a PDF with a "text layer". My goal is to show a PDF with the extracted text as a layer and to have the user select certain lines as areas of interest. pdftotext only shows me…
2
votes
1 answer

Tesseract OCR with numeric tables

I need to OCR old statistical tables that contain numerical values for each town in a given area. I use Tesseract 4.0.0-beta.3, and in most cases I get acceptable results, but in some others the software fails to recognise the structure of the table…
Kazi bácsi
  • 123
  • 6
2
votes
0 answers

Is there a way to export automatically (all) zones/boxes as jpg/png/images?

I would like to export a piece from each page of a pdf document in order to create a shorter document (to facilitate manual pre-treatments, actually). I've already "boxed"/"zoned" them with Abbyy FineReader (12). Would anyone know how to export…
2
votes
1 answer

EAST text detection -215:Assertion failed (OpenCV Python)

When trying to use EAST text detector on some images, with OpenCV in Python on Windows 10, I get the following error: cv2.error: OpenCV(4.0.0) C:\projects\opencv-python\opencv\modules\dnn\src\dnn.cpp:835: error: (-215:Assertion failed)…
kr1zz
  • 116
  • 2
  • 8
2
votes
0 answers

How TO Crop Region Of Detected Number (OCR) Using Google Vision?

I write a code to knowing the value of money. I'm using OCR from Mobile Vision to get the number and words then if match (i put some condition) the App will play a voice of the value. Now i want to make some experiment. I want this app cropping the…
otniel
  • 293
  • 3
  • 4
  • 17
1 2 3
99
100