I am trying to identify single digits in python with tesseract.
My code is this:
import numpy as np
from PIL import Image
from PIL import ImageOps
import pytesseract
import cv2
def predict(imageArray):
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
newImageArray = Image.open(imageArray)
number = pytesseract.image_to_string(newImageArray, lang='eng', config='--psm 10 --oem 1 -c tessedit_char_whitelist=0123456789')
return number
It has no problem saying this is an 8
but it does not recognise this as a 4
My images are just digits 0-9
.
This is just one such example there are other instances where it struggles to identify "obvious/clear" digits.
Currently the only thing I am doing to my starting image,image
is converting the colour. Using the following:
cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Is there a way I can improve the accuracy. All of my images are clear computer typed images so I feel the accuracy should be a lot higher than it is.