I am trying to analyze a page footer in a video and retrieve the current page number. I got the frame collection working but I am struggling on reading the page number itself, using EasyOCR.
I already tried using pytesseract, but that doesnt work well. I have misinterpreted numbers: 10 gets recognized as 113, 6 as 41 and so on. Overall its very inconsistent, even though I format my input image correctly with grayscale, threshholding and cropping (only analyzing the pagenumber area of the footer).
Here is the code:
def getPageNumberTest(path, psm):
image = cv2.imread(path)
height = len(image)
width = len(image[0])
# the height of the footer
footerHeight = 90 # int(height / 15.5)
# retrieve only the footer from the image
cropped = image[height-footerHeight:height,0:width]
results = reader.readtext(cropped)
Which gives me the following output:
Is there a setting I am missing? Is there a way to instruct EasyOCR to look for numbers only? Any help or hint is appreciated!
EDIT:
After some fiddling around with some optimizations of the number-images, I am now back to the beginning, not optimizing the images at all. All thats left is the conversion to gray-scale and a resize.
This is what a normal input looks like:
But the results are:
Which is weird, because for most numbers (especially for single digits) this works flawlessly, yielding certainties of over 95%...
I tried deblurring, threshholding, denoising with cv2.filter2D(), blurring,...
When I use threshholding, for example, my output looks like this (ignoring the "1", same applies for the single digit "1"):
I had a look into pattern matching, which isnt an option because I don't know the pagenumber shape beforehand...