0

I'm trying to make use of Pytesseract to do some very basic character recognition. When I run the following code in Linux, the output makes sense:

import matplotlib.pyplot as plt
import pandas as pd

import sys
import pytesseract
# need to add tesseract install location to path in windows.
if sys.platform == 'win32':
    tesseract_path = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
    pytesseract.pytesseract.tesseract_cmd = tesseract_path

img = pd.read_csv('https://www.dropbox.com/s/fcs5bcmy73j75o0/two.csv?dl=1').values
fig,ax=plt.subplots()
ax.imshow(img.astype(float),cmap='gray')

print('identified as {}'.format(pytesseract.image_to_string(img.astype(float))))

linux screenshot

But the same call to pytesseract.image_to_string in Windows returns an empty string:

windows screenshot

Code is executed on both machines in a Python 3 environment.

Is there an obvious step I might have missed when installing Tesseract on my Windows machine that would explain this behavior?

Tesseract in Windows was installed using the following installer: https://github.com/UB-Mannheim/tesseract/wiki

In Linux, I simply used: yum install tesseract

ollerend
  • 500
  • 5
  • 17

1 Answers1

0

I ran into the same problem, and turned out if I set tesseract_cmd link to Tesseract-ocr v5.0 folder (which I install from here), it worked perfectly.

pytesseract.pytesseract.tesseract_cmd = 'C:\\Users\\minh.nguyen\\AppData\\Local\\Tesseract-OCR\\tesseract.exe'
  • Note that I use tesseract v5 instead of v4.1 because it had better results.