4

I'm trying to use Tesseract-OCR to get the readings on below images but having issues getting consistent results with the spotted background. I have below configuration on my pytesseract

CONFIG = f"—psm 6 -c tessedit_char_whitelist=01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZÅÄabcdefghijklmnopqrstuvwxyzåäö.,-"

I have also tried below image pre-processing with some good results, but still not perfect results

blur = cv2.blur(img,(4,4))
(T, threshInv) = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)

What I want is to consistently be able to identify the numbers and the decimal separator. What image pre-processing could help in getting consistent results on images as below?

enter image description here enter image description here enter image description here enter image description here enter image description here enter image description here enter image description here

Jeru Luke
  • 20,118
  • 13
  • 80
  • 87
MisterButter
  • 749
  • 1
  • 10
  • 27

2 Answers2

5

You may find a solution using a slightly more complex approach by filtering in the frequency domain instead of the spatial domain. The thresholds might require some tweaking depending on how tesseract performs with the output images.

Implementation:

import cv2
import numpy as np
from matplotlib import pyplot as plt

img = cv2.imread('C:\\Test\\number.jpg', cv2.IMREAD_GRAYSCALE)

# Perform 2D FFT
f = np.fft.fft2(img)
fshift = np.fft.fftshift(f)
magnitude_spectrum = 20*np.log(np.abs(fshift))

# Squash all of the frequency magnitudes above a threshold
for idx, x in np.ndenumerate(magnitude_spectrum):
    if x > 195:
        fshift[idx] = 0

# Inverse FFT back into the real-spatial-domain
f_ishift = np.fft.ifftshift(fshift)
img_back = np.fft.ifft2(f_ishift)
img_back = np.real(img_back)
img_back = cv2.normalize(img_back, None, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
out_img = np.copy(img)

# Use the inverted FFT image to keep only the black values below a threshold
for idx, x in np.ndenumerate(img_back):
    if x < 100:
        out_img[idx] = 0
    else:
        out_img[idx] = 255

plt.subplot(131),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(132),plt.imshow(img_back, cmap = 'gray')
plt.title('Reversed FFT'), plt.xticks([]), plt.yticks([])
plt.subplot(133),plt.imshow(out_img, cmap = 'gray')
plt.title('Output'), plt.xticks([]), plt.yticks([])
plt.show()

Output:

output

Median Blur Implementation:

import cv2
import numpy as np

img = cv2.imread('C:\\Test\\number.jpg', cv2.IMREAD_GRAYSCALE)
blur = cv2.medianBlur(img, 3)

for idx, x in np.ndenumerate(blur):
    if x < 20:
        blur[idx] = 0

cv2.imshow("Test", blur)
cv2.waitKey()

Output:

enter image description here

Final Edit:

So using Eumel's solution and combining this bit of code on the bottom of it yields a 100% successful result:

img[pat_thresh_1==1] = 255
img[pat_thresh_15==1] = 255
img[pat_thresh_2==1] = 255
img[pat_thresh_25==1] = 255
img[pat_thresh_3==1] = 255
img[pat_thresh_35==1] = 255
img[pat_thresh_4==1] = 255

# Eumel's code above this line

img = cv2.erode(img, np.ones((3,3)))

cv2.imwrite("out.png", img)
cv2.imshow("Test", img)

print(pytesseract.image_to_string(Image.open("out.png"), lang='eng', config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789.,'))

Output Image Examples:

enter image description here enter image description here enter image description here enter image description here enter image description here enter image description here enter image description here enter image description here

Whitelisting the tesseract characters appears to help quite a bit as well to prevent false identification.

Abstract
  • 985
  • 2
  • 8
  • 18
  • thank you, I will try this out and revert back with results! Since I practically no nothing of this I highly appreciate this answer, thank you for taking your time and effort! – MisterButter Jul 28 '21 at 15:11
  • @MisterButter No problem! If it doesn't perform well, please let me know. There's many more tricks to solving a problem like this, and some may work better. The only tricky part is not having tesseract setup on my machine, so I can't readily provide feedback on performance. – Abstract Jul 28 '21 at 15:12
  • I tried your approach and found that it did not preform that well, (less consistent than blurring and threshing), the problem to me seems to be that the spotted background that remain after transformation interferes and that number e.g “0” gets eroded somewhat in the process making it difficult for tesseract to get a read. My experience is that tesseract has mucg more accuracy when text is thicker rather than thinner – MisterButter Jul 28 '21 at 15:40
  • Added an additional method to my answer. Try a simple median blur. Although we might have to do something about that first number `1`, but if the remainder of the output looks good, we can fix that. – Abstract Jul 28 '21 at 15:52
  • @MisterButter and additionally, try replacing the `20` threshold with `242`. I'm seeing a bit more noise, but much more distinct results at that threshold – Abstract Jul 28 '21 at 16:02
  • I found about the same consistency on both threshholds and them making approximate the same errors, still not performing above blurring and threshing. This seems like a very hard problem to solve with consistent results! – MisterButter Jul 28 '21 at 16:24
  • I'll install tesseract here in a bit and see if we can put a nail in this one. Provided noboby comes up with an elegant solution between then and now. – Abstract Jul 28 '21 at 16:26
  • Haha stubborn, i really appreciate you trying to crack this one! Keep in mind that I use tesseract v5 alpha – MisterButter Jul 28 '21 at 16:34
  • Think between Eumel's answer and my (very small) addition to it, looks like you have a good solution – Abstract Jul 28 '21 at 20:10
  • Wow that is some amazing results, thank you so much! The output images looks really good, thanks again for your time and effort that you put into this – MisterButter Jul 29 '21 at 05:58
5

That was a challenge but i think i have an interesting approach: Pattern-matching

If you zoom in, you realize that the pattern in the back only has 4 possible dots, a single full pixle, a double full pixel and a double pixel with a medium left or right. So what i did was grab these 4 patterns from the image with 17.160.000,00 and got to work. Save these to load again, i just grabbed them on the fly

img = cv2.imread('C:/Users/***/17.jpg', cv2.IMREAD_GRAYSCALE)

pattern_1 = img[2:5,1:5]
pattern_2 = img[6:9,5:9]
pattern_3 = img[6:9,11:15]
pattern_4 = img[9:12,22:26]

# just to show it carries over to other pics ;)
img = cv2.imread('C:/Users/****/6.jpg', cv2.IMREAD_GRAYSCALE)

Actual Pattern Matching

Next we match all the patterns and threshold to find all occurrences, i used 0.7 but you can play around with it a little. These patterns take off some pixels on the side and only match a sigle pixel on the left so we pad twice (one with an extra) to hit both for the first 3 patterns. The last one is the single pixel so it doesnt need it

res_1 = cv2.matchTemplate(img,pattern_1,cv2.TM_CCOEFF_NORMED )
thresh_1 = cv2.threshold(res_1,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_1 = np.pad(thresh_1,((1,1),(1,2)),'constant')
pat_thresh_15 = np.pad(thresh_1,((1,1),(2,1)), 'constant')
res_2 = cv2.matchTemplate(img,pattern_2,cv2.TM_CCOEFF_NORMED )
thresh_2 = cv2.threshold(res_2,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_2 = np.pad(thresh_2,((1,1),(1,2)),'constant')
pat_thresh_25 = np.pad(thresh_2,((1,1),(2,1)), 'constant')
res_3 = cv2.matchTemplate(img,pattern_3,cv2.TM_CCOEFF_NORMED )
thresh_3 = cv2.threshold(res_3,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_3 = np.pad(thresh_3,((1,1),(1,2)),'constant')
pat_thresh_35 = np.pad(thresh_3,((1,1),(2,1)), 'constant')
res_4 = cv2.matchTemplate(img,pattern_4,cv2.TM_CCOEFF_NORMED )
thresh_4 = cv2.threshold(res_4,0.7,1,cv2.THRESH_BINARY)[1].astype(np.uint8)
pat_thresh_4 = np.pad(thresh_4,((1,1),(1,2)),'constant')

Editing the Image

Now the only thing left to do is remove all the matches from the image. Since we have a mostly white backround we just set them to 255 to blend in.

img[pat_thresh_1==1] = 255
img[pat_thresh_15==1] = 255
img[pat_thresh_2==1] = 255
img[pat_thresh_25==1] = 255
img[pat_thresh_3==1] = 255
img[pat_thresh_35==1] = 255
img[pat_thresh_4==1] = 255

Output

no more background

Edit:

Take a look at Abstracts answer as well for refining this output and tesseract finetuning

Eumel
  • 1,298
  • 1
  • 9
  • 19
  • Lol, ran this through tesseract on my end and it still struggles. The number `3.114.758,00` image outputs `114.758,00` and `17.160.000,00` becomes `160.000,00`. But we're getting close! – Abstract Jul 28 '21 at 17:54
  • This looks phenomenal, I will try it out asap tomorrow, your output looks extremely clean! Thank you @Eumel for the writeup and for your effort and time, this looks really promising and I hope it will produce the desired results, thank you! – MisterButter Jul 28 '21 at 17:54
  • It past 6/7 test, with only 1 digit wrong! This is the highest accuracy I have seen! The only number it read wrong was 3.114.758, it read it as 3.414.758,00. This is still insanely good accuracy! – MisterButter Jul 28 '21 at 18:00
  • This has proven to be a stable approach that returns consistent results, I added a Guassianblur and otsu threshing and the accuracy was great for the different images! Thank you for showing me that it can be done this way! – MisterButter Jul 28 '21 at 19:57