Questions tagged [python-tesseract]

Python-tesseract is a wrapper class for Tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and get its text, data of text, or even convert it to pdf.

Python-tesseract is a wrapper class for OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and decoded into usable text.

Tesseract is advertised as the most accurate open source OCR engine available. It was developed at HP Labs between 1985 and 1995 and then remained dormant until 2006 when Google revived the project.

For more information, please see the Python-tesseract page or the Tesseract page.

1664 questions
7
votes
3 answers

No such file or directory: 'tesseract': 'tesseract' even though where to find tesseract is specified in pytesseract.py

So I have been working on this problem for awhile and while others have had questions similar to this but nothing has worked for me: I am trying to use pytesseract for a project and I have it installed under…
Charlie22
  • 71
  • 1
  • 4
7
votes
1 answer

How to use trained data with pytesseract?

Using this tool http://trainyourtesseract.com/ I would like to be able to use new fonts with pytesseract. the tool give me a file called *.traineddata Right now I'm using this simple script : try: import Image except ImportError: from PIL…
Simon Breton
  • 2,638
  • 7
  • 50
  • 105
7
votes
5 answers

leptonica/allheaders.h file not found (gcc error) on install of tesseract-ocr

I am trying to run the following code on my mac. import Image import pytesseract im = Image.open('test.png') print(pytesseract.image_to_string(im)) Following the question from here: pytesseract-no such file or directory error I need to install…
Jase Villam
  • 2,895
  • 6
  • 18
  • 21
7
votes
2 answers

UnicodeDecodeError with Tesseract OCR in Python

Iam trying to extract text from an image file using Tesseract OCR in Python but I'am facing an Error that i can figure out how to deal with it. all my environment is good as i tested some sample image with the ocr in python! here is the code from…
Nwawel A Iroume
  • 1,249
  • 3
  • 21
  • 42
7
votes
1 answer

image_to_string doesn't work in Mac

I'm trying to follow this example of pytesser (link) in a Mac Maverick. >>> from pytesser import * >>> im = Image.open('phototest.tif') >>> text = image_to_string(im) But, in the last line I get this error message: Traceback (most recent call…
Filipe Ferminiano
  • 8,373
  • 25
  • 104
  • 174
6
votes
1 answer

pytesseract - Invalid resolution 0 dpi

I am using pytesseract v5.0 and I am rotating the image with OpenCV and then passing it to pytesseract.image_to_osd(). There are some images that work with the image_to_osd, but other images do not and the program gives me the following…
Anil B
  • 61
  • 1
  • 2
6
votes
3 answers

Pytesseract is very slow for real time OCR, any way to optimise my code?

I'm trying to create a real time OCR in python using mss and pytesseract. So far, I've been able to capture my entire screen which has a steady FPS of 30. If I wanted to capture a smaller area of around 500x500, I've been able to get 100+…
Vamsi
  • 103
  • 1
  • 1
  • 7
6
votes
1 answer

How to configure pytesseract to support text detection for non English language in windows 10?

I have tried pytesseract for English. It's working fine and generates expected result. But when it comes for other languages (eg: Arabic) other than english, it fails to do so and gives following error: TesseractError: (1, 'Error opening data file…
6
votes
2 answers

how to use tessdata_best for tesseract (pytesseract). What are the arguments and procedure?

TL;DR: How do I install tessdata_best to use withpytesseract inside conda in Ubuntu 18? I have been using pytesseract inside conda environment for quite some but there is a need to improve the accuracy and I found out that tessdata_best gives you…
Deshwal
  • 3,436
  • 4
  • 35
  • 94
6
votes
0 answers

How to extract text with math symbols using pytesseract/tesseract version 4.0 (using equ.traineddata). 'equ' is no longer supported

How can I use the tesseract to extract the mathematical equation? While reading the image given below: after using: img = cv2.imread(IN_PATH+'sample1.png') pytesseract.image_to_string(img) I get the result as: 'The value of 7/8144 is\n- (a) 20.2…
Deshwal
  • 3,436
  • 4
  • 35
  • 94
6
votes
1 answer

pytesseract not recognizing text as expected?

I am trying to run a simple license plate image through opencv and pytesseract to get the text but I am having trouble getting anything out of it. Following the tutorial…
LoF10
  • 1,907
  • 1
  • 23
  • 64
6
votes
1 answer

How to improve OCR with Pytesseract text recognition?

Hi I am looking to improve my performance with pytesseract at digit recognition. I take my raw image and split it into parts that look like this: The size can vary. To this I apply some pre-processing methods like so image = cv2.imread(im,…
tepsupek
  • 61
  • 1
  • 2
6
votes
2 answers

How to extract text from table in image?

I have data which in a structured table image. The data is like below: I tried to extract the text from this image using this code: import pytesseract from PIL import Image value=Image.open("data/pic_table3.png") text =…
Afianh
  • 118
  • 1
  • 6
6
votes
1 answer

How to extract text or numbers from images using python

I want to extract text (mainly numbers) from images like this I tried this code import pytesseract from PIL import Image pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' img = Image.open('1.jpg') text =…
Hosam Gamal
  • 163
  • 1
  • 2
  • 12
6
votes
1 answer

How to make image more contrast, grayscale then get all characters exactly with PIL and pytesseract?

PLease download the attatchment here and save it as /tmp/target.jpg. You can see that there are 0244R in the jpg,i extract string with below python code: from PIL import Image import pytesseract import cv2 filename = "/tmp/target.jpg" image =…
showkey
  • 482
  • 42
  • 140
  • 295