Preserving different color text when thresholding with OpenCV-Python

Question

I have an image with mostly light text on a dark background. Some of the text is a darker color (purple).

I'm using opencv-python to manipulate the image for better OCR parsing.

There is a little more processing that happens before this, but I feel like the processing steps giving me trouble are as follow.

The image gets converted to grayscale

cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

The image then gets inverted (this seems to keep the final text clearer)

cv2.bitwise_not(img)

The image then gets run through threhold

cv2.threshold(img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

You can see I'm totally losing the darker text. Switching to an adaptive threshold does preserve the text better but creates a ton of noise (the background appears flat black but is not).

Any thoughts on how I can modify my current thresholding to preserve that darker text?

Delta de Dirac · Accepted Answer · 2023-07-31T22:39:39.020

1

It appears your use case isn't ideal for Otsu, I'd remove the option from threshold and hardcode or calculate through other method the threshold range.

Hardcoded to these values:

cv2.threshold(img, 215, 255, cv2.THRESH_BINARY)[1]

Yields the following result:

Calculating though other method would depend on the range of possible inputs you have. Depending on the range of possible inputs you might want to filter or preprocess the image in other ways before the thresholding.

edited Jul 31 '23 at 22:39

answered Jul 31 '23 at 22:10

Delta de Dirac

339
5

That appears to have done the trick. Some images did with with Otsu, so I just created a toggle to try both variations if one failed without Otsu. – Bryant Makes Programs Jul 31 '23 at 22:46

Preserving different color text when thresholding with OpenCV-Python

1 Answers1