Questions tagged [python-tesseract]

Python-tesseract is a wrapper class for Tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and get its text, data of text, or even convert it to pdf.

Python-tesseract is a wrapper class for OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and decoded into usable text.

Tesseract is advertised as the most accurate open source OCR engine available. It was developed at HP Labs between 1985 and 1995 and then remained dormant until 2006 when Google revived the project.

For more information, please see the Python-tesseract page or the Tesseract page.

1664 questions
0
votes
0 answers

Tesseract accuracy in screenshot

I'm trying to fetch score info from a Dota2 screenshot (disregard the "wrong" boxes on the borders) but I can't seem to get a good enough accuracy I'm applying these filters to the image bw_image = cv2.bitwise_not(img) bw_image =…
Carlos Silva
  • 101
  • 2
  • 9
0
votes
1 answer

Use pytesseract to read content that is out of position

I am using pytesseract to read the content related to date/time. This works well when the content to be read is on the same line. However, in the following case, I am not even able to use OpenCV to identify the area containing information: image Can…
Pain
  • 1
  • 1
0
votes
0 answers

Can anyone recommend an approach for drawing a bounding box returned by pytesseract?

I can pull data from an image using pytesseract and obtain the bounding box for the text that it recognises. I would like to be able to plot the bounding boxes on the original image to help a manual checker to confirm that OCR has been carried out…
Dan Peel
  • 1
  • 1
0
votes
0 answers

How to extract the text displayed in a dot matrix LED display

I need to extract the number from a display(LED dot matrix) Sample Image: I am using the example code given by pytesseract to test. But I am failing. try: from PIL import Image except ImportError: import Image import pytesseract # If you…
Ram
  • 49
  • 1
  • 1
  • 13
0
votes
0 answers

How to clean text in image for Tesseract to read

I'm using pytesseract to read text from images, but the text is rotated and there is a light source that creates shadows. The code rotates the image half a degree each time expecting a match but some of the images I provide (all of them are quite…
0
votes
1 answer

Cleaning Image for Reading Numbers Pytesseract

I'm trying to read some entries from a table of data filled with a name and then columns of numbers. Here's the original picture: Between binarizing, converting to black/white, and just inverting, I found that inverting the image led to the best…
0
votes
0 answers

How to grab text in these images using pytesseract?

I need to get data in the form: Title of the image, Latitude, Longitude I have tried pytesseract but the resultant text is not accurate. I found alphabets in the text rather than latitude and longitude. I need to get data in the form: Title of the…
G.S. J
  • 233
  • 1
  • 8
0
votes
1 answer

Tesseract 4.0.0-beta.1 - Training

I want to train tesseract 4.0.0. But when i searched for it, only training for version 3 is seen. Can someone suggest me some blogs that explains tesseract 4.0.0 training.
atheesh
  • 23
  • 1
  • 4
0
votes
1 answer

How to increase Pytesseract's accuracy in extracting digits

I am testing Pytesseract, and use it to extract digits like the one below. The image is of fairly decent quality (200 dpi). However, when I run pytesseract, it gives me the result 456-/8-0000, where the digit 7 is misrecognized as '/'. While "/"…
Alex
  • 4,030
  • 8
  • 40
  • 62
0
votes
0 answers

pytesseract.image_to_string() doesnt give any output

I am using pytesseract to extract text from an image. from PIL import Image from pytesseract import image_to_string, image_to_boxes, image_to_data img =…
chink
  • 1,505
  • 3
  • 28
  • 70
0
votes
1 answer

pytesseract.pytesseract.TesseractError: (255, '')

string = pytesseract.image_to_string(res,lang ='eng',config = config) I am getting an error as: pytesseract.pytesseract.TesseractError: (255, '') i am cropping the images and performing some image processing tasks. After that I want to do ocr, on…
Vikas
  • 11
  • 2
0
votes
0 answers

How to extract date from multiple transaction receipts in python having no pattern

I have multiple transaction receipts and am trying to extract the invoice amount from each of these receipts. The problem is that the ocr I am using is not being able to capture certain amounts from the document. I have used pillow and pytesseract…
0
votes
0 answers

How to remove black pixels around letters in Open-cv?

I want to be able to use tesseract to identify Scrabble letters. Right now, I am using an adaptive gaussian threshold and while the letters are looking nice, I cannot figure out how to remove the black area surrounding them. image =…
Alexander
  • 21
  • 4
0
votes
1 answer

Trying to recognize Captcha with OpenCV & Tesseract in python, but not good Accuracy

I'm trying to recognize Captcha to Text. This captcha is not very difficult. (as I think). I open the image and convert it with OpenCV, to make it easy to recognize. I will show you an example. Example Captcha After OpenCV Catpcha image =…
zenyatta
  • 97
  • 2
  • 9
0
votes
0 answers

Cache error while doing OCR on a directory of pdf's in python

I am trying to OCR an entire directory of pdf files using pytesseract and imagemagick but the issue is that imagemagick is consuming all my Temp folder space and finally I'm getting a cache error i.e "CacheError: unable to extend cache…
ajai biltu
  • 55
  • 6