I'm trying to make use of Pytesseract to do some very basic character recognition. When I run the following code in Linux, the output makes sense:
import matplotlib.pyplot as plt
import pandas as pd
import sys
import pytesseract
# need to add tesseract install location to path in windows.
if sys.platform == 'win32':
tesseract_path = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
pytesseract.pytesseract.tesseract_cmd = tesseract_path
img = pd.read_csv('https://www.dropbox.com/s/fcs5bcmy73j75o0/two.csv?dl=1').values
fig,ax=plt.subplots()
ax.imshow(img.astype(float),cmap='gray')
print('identified as {}'.format(pytesseract.image_to_string(img.astype(float))))
But the same call to pytesseract.image_to_string
in Windows returns an empty string:
Code is executed on both machines in a Python 3 environment.
Is there an obvious step I might have missed when installing Tesseract on my Windows machine that would explain this behavior?
Tesseract in Windows was installed using the following installer: https://github.com/UB-Mannheim/tesseract/wiki
In Linux, I simply used:
yum install tesseract