Questions tagged [python-tesseract]

Python-tesseract is a wrapper class for Tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and get its text, data of text, or even convert it to pdf.

Python-tesseract is a wrapper class for OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and decoded into usable text.

Tesseract is advertised as the most accurate open source OCR engine available. It was developed at HP Labs between 1985 and 1995 and then remained dormant until 2006 when Google revived the project.

For more information, please see the Python-tesseract page or the Tesseract page.

1664 questions
4
votes
6 answers

Pyinstaller and Tesseract OCR

I am using Tesseract OCR for my program and I am going to convert it into a single .exe file using pyinstaller. The problem is that in order for Tesseract to work, I need to reference the path to the program installed on my computer, like this:…
Mirrah
  • 125
  • 2
  • 9
4
votes
1 answer

Tesseract OCR image recognition failed because of `Warning: Invalid resolution` error

I tried to detect text from an image where I draw bounding boxes around select characters and stitch them together to form another image as below : I used cv2 to draw bounding boxes around the characters using the following code : cnts =…
Mayank
  • 1,364
  • 1
  • 15
  • 29
4
votes
1 answer

python pytesseract.image_to_string can't read text in image

I am using python3.7 and Tesseract-OCR version 5 on my Windows 10 box. I have pictures containing the numbers. However, despite that it is super clear to the human eyes, the Tesseract can't extract them correctly. Some give me a couple of correct…
Difan Zhao
  • 379
  • 6
  • 20
4
votes
1 answer

What causes pytesseract to read either the top or bottom text-line of a dual-line image depending on whether opencv or pillow is used?

EDIT: I forgot to process the image which solves the reading issue, thanks to Nathancy. Still wondering what makes Tesseract read only the top OR the bottom line of an unprocessed image (same image, two different outcomes) Orignal: I have an image…
4
votes
2 answers

OCR on floorplan screenshots with pytesseract and OpenCV

I am trying to write a function that will take a jpg of a floorplan of a house and use OCR to extract the square footage that is written somewhere on the image import requests from PIL import Image import pytesseract import pandas as…
Harvs
  • 503
  • 1
  • 6
  • 18
4
votes
3 answers

How to extract decimal in image with Pytesseract

Above is the image ,I have tried everything I could get from SO or google ,nothing seems to work. I can not get the exact value in image , I should get 2.10 , Instead it always get 210. And it is not limited to this image only any image which have…
4
votes
0 answers

Retaining tabular structure after extracting data using OCR Pytesseract

I am using OCR Pytesseract to extract data from an image which has tabular data. I am extracting it to a textfile and I wish to store it in an excel sheet. I Couldn't directly store it into an excel sheet. But the problem I am encountering is that…
developer
  • 257
  • 1
  • 3
  • 15
4
votes
1 answer

Pytesseract does not recognize when it's just a letter

I need to recognize only one letter But OCR does not recognize when it's just a letter in this case I am trying to recognize the letter H but nothing shows up What can I do to make it work? from PIL import Image from pytesseract import * import…
4
votes
3 answers

Why does pytesseract fail to recognize digits in this simple image?

I'm trying to use pytesseract to recognize two numbers from an image: I have tried --psm 6 up to 10 I have tried -c tessedit_char_whitelist=0123456789' None of the above returns 49 number. Closest I got is returned 4 without 9 Do you have any…
Povilas
  • 627
  • 2
  • 6
  • 19
4
votes
0 answers

PyTesseract incredibly slow to process single image

I have Tesseract running in python via pytesseract. Using a image of a newspaper article which happens to contain around 600 words, the pytesseract.image_to_string function takes around 20 seconds to complete. The eventual results are great, but…
user3795126
  • 109
  • 2
  • 5
4
votes
1 answer

Reading low resolution image with pytesseract

I'm trying to read off some stats off the cropped (manually) sections of tables in pdf files. Here is the image I'm trying to process The current result I get has most of the numbers but not all of the text, as seen below: Hmuwinu'fg. cm’:…
AndreK
  • 41
  • 3
4
votes
3 answers

How to find rotate and crop a section of text in openCV, python

I'm in a struggle with a project that takes an image of a pretty clear font from say a label for example reads the "text region" and outputs it as a string using OCR tesseract for instance. Now I've made quite some progress with the thing as I added…
MikeLemo
  • 521
  • 1
  • 5
  • 11
4
votes
2 answers

TesseractNotFoundError: tesseract is not installed or it's not in your path

I am trying to use tesseract-OCR to print text from the image. But I am getting the above error. I have installed tesseract OCR using https://github.com/UB-Mannheim/tesseract/wiki and pytesseract in the anaconda prompt using pip install pytesseract…
4
votes
0 answers

Use Pytesseract to Extract Text into Table Arrays Given the Coordinates of the Table Structure

I want to extract texts from a scanned table with tesseract and put it them into arrays that have the same structure as the table. I already used opencv to detect the table structure, and obtained the coordinates of the table joints as well as the…
Bec Zhao
  • 35
  • 5
4
votes
0 answers

Tesseract Unknown Font training

I have been trying to train Tesseract 4.0 to recognise "customized" font from an engineering blueprint. I've followed the necessary steps using Training Tesseract from here…
Ankita
  • 41
  • 1