2

Is it possible to somehow make it so that all text in a document is black on white after thresholding. I've been looking online alot but I haven't been able to come to a solution. My current thresholded image is: https://i.ibb.co/Rpqcp7v/thresh.jpg

The document needs to be read by an OCR and for that I need to have the areas that are currently white on black, to be inverted. How would I go about doing this? my current code:

# thresholding
def thresholding(image):
    # thresholds the image into a binary image (black and white)
    return cv2.threshold(image, 120, 255, cv2.THRESH_BINARY)[1]
  • Why wouldn't regular OCR work with your current thresholding? The images shouldn't get detected and therefor shouldn't be an issue? – Dan P Nov 03 '21 at 14:19
  • I've just noticed that very often the wrong data is being read out. so I have several ways of preprocessing, it picks the highest confidence result in the end and usually the result is a lot better this way. This is just still the main obstacle I have. – Sander Berntsen Nov 03 '21 at 14:25
  • Maybe try this to get just the text from the image, then run you're OCR on it after: https://stackoverflow.com/a/54125216/9178557 – Dan P Nov 03 '21 at 14:31
  • Already tried Inverted thresholding, but it wouldnt fix the problem right? Unless i could combine the resulting images somehow (take the white background area from each image) – Sander Berntsen Nov 03 '21 at 14:32

1 Answers1

4

Use a median filter to estimate the dominant color (background).

Then subtract the image from that... you'll get white text on black background. I'm using the absolute difference. Invert for black on white.

im = cv.imread("thresh.jpg", cv.IMREAD_GRAYSCALE)
im = cv.pyrDown(cv.pyrDown(im)) # picture too large for stack overflow
bg = cv.medianBlur(im, 51) # suitably large kernel to cover all text
out = 255 - cv.absdiff(bg, im)

enter image description here

enter image description here enter image description here

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
  • This is fantastic and works great. I do get a slightly different result https://i.ibb.co/C8Gg9y1/thresh-0.jpg . the main problem here is for example the name on top, it is unreadable by the OCR. which is odd since I basically copied your solution. why do I get a different result .-. – Sander Berntsen Nov 05 '21 at 07:45
  • I didn't reveal all in my solution. I downscaled the input a few times. use a larger kernel size to compensate. it's a fiddle factor. – Christoph Rackwitz Nov 05 '21 at 08:39
  • I see. it works better when I put it on 75 for example. Thank you! – Sander Berntsen Nov 05 '21 at 08:43
  • 1
    That's the best image processing method I'have ever seen for an ocr operation. Thank you. – quents Jan 07 '22 at 14:50