Adaptive thresholding handling for big black letters

Question

I have a document that has both very big and very small letters and I'm applying adaptive thresholding to it.

cvtColor(mbgra, dst, CV_BGR2GRAY);
GaussianBlur(dst, dst, Size(11, 11), 0);
adaptiveThreshold(dst, dst, 255, ADAPTIVE_THRESH_GAUSSIAN_C, THRESH_BINARY, 11, 3);

The algorithm is working great, but I have a small problem regarding big black letters as it becomes hollow from the inside like this

The original image has those letters filled with black

The question is how to make those letters filled out with blacks as in the original image without increasing the block size of the filter as this won't play well with small letters!

Any thoughts or suggestions are of course welcome!

have you tried playing with ```blockSize``` parameter? perhaps increasing the neighborhood could help? — jolaem, Jan 18 '19 at 16:02
This is a cutout of the original image which also have small letter and if the block size is very big, it won't play well with small letters! — Ahmed Hegazy, Jan 18 '19 at 16:05
Does it need to be automatic? If not you can just apply the larger blocks to certain parts of the image. Otherwsie you can try something along the [pyramid](https://en.wikipedia.org/wiki/Pyramid_(image_processing)) where you increase the window size and select the best result as validated with ground truth (provided you have one). — jolaem, Jan 18 '19 at 16:11
@AhmedHegazy Have you tried a basic binary threshold (maybe even OTSU?) then apply morphological filters (erode/dilate) to get read of tiny holes ? — George Profenza, Jan 18 '19 at 17:09
Seems to me that your real problem is extracting text from the image with both small AND large text - that would be a better image to post in a question. — DisappointedByUnaccountableMod, Jan 18 '19 at 20:39
@jolaem I don't have pre-knowledge of where the letters would be arranged! — Ahmed Hegazy, Jan 21 '19 at 12:17
@GeorgeProfenza I've tried OTSU, but it gives me bad results if there is some shading on the image! — Ahmed Hegazy, Jan 21 '19 at 12:19
@AhmedHegazy have you tried running it on per patch/window basis with iteratively increasing ```blockSize``` parameter within each patch? Then you can chose the best result per patch/window — jolaem, Jan 21 '19 at 15:09

score 0 · Answer 1 · answered Jan 18 '19 at 16:40

The following code:

import numpy as np
import cv2
import matplotlib.pyplot as plt

image = cv2.imread("FYROJ.png")
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 3)

im_contours, contours, hier = cv2.findContours(thresh, mode=cv2.RETR_TREE, method=cv2.CHAIN_APPROX_NONE)
hier = hier[0]
kept_contours = [contour for idx, contour in enumerate(contours) if hier[idx][2] >= 0]

drawing = np.zeros_like(gray)
cv2.drawContours(drawing, kept_contours, -1, color=255)

ret, markers = cv2.connectedComponents(drawing)

watershed_res = cv2.watershed(image, np.int32(markers))

plt.imshow(watershed_res)
plt.show()

Will generate this image:

Maybe try to start from here and select regions where there are a lot of black pixels in the original image...

Thanks a lot for your answer, but detecting contours will be very specific to this image as the image could have much noise inside like icons and logos. what do you think? Please also take a look at my updated picture! — Ahmed Hegazy, Jan 21 '19 at 12:00

Adaptive thresholding handling for big black letters

1 Answers1