I am trying to do OCR of vehicles such as trains or trucks to identify the numbers and characters written on them. (Please note this is not license plate identification OCR)
I took this image. The idea is to be able to extract the text - BN SF 721 734 written on it.
For pre-processing, I first converted this image to grayscale and then converted it to a binarized image which looks something like this
I wrote some code in tesseract.
myimg = "image.png"
image = Image.open(myimg)
with PyTessBaseAPI() as api:
api.SetImage(image)
api.Recognize()
words = api.GetUTF8Text()
print words
print api.AllWordConfidences()
This code gave me a blank output with a confidence value of 95 which means that tesseract was 95% confident that no text exists in this image.
Then I used the setrectangle api in Tesseract to restrict OCR on a particular window within the image instead of trying to do OCR on the entire image.
myimg = "image.png"
image = Image.open(myimg)
with PyTessBaseAPI() as api:
api.SetImage(image)
api.SetRectangle(665,445,75,40)
api.Recognize()
words = api.GetUTF8Text()
print words
print api.AllWordConfidences()
print "----"
The coordinates 665, 445, 75 and 40 correspond to a rectangle which contains the text BNSF 721 734 in the image. 665 - top, 445- left, 75 - width and 40 - height.
The output I got was this:
an s
m,m
My question is how do I improve the results? I played around with the values in the setrectangle function and the results varied a bit but all of them were equally bad.
Is there a way to improve this?
If you are interested in how I converted the images to binarized images, I used OpenCV
img = cv2.imread(image)
grayscale_img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
(thresh, im_bw) = cv2.threshold(grayscale_img, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
thresh = 127
binarized_img = cv2.threshold(grayscale_img, thresh, 255, cv2.THRESH_BINARY)[1]