I need to detect single characters as part of a bigger project, but seem to have hit a roadblock.
This is what I have to work with:
I've already done a bit of preprocessing to get to this stage (removing the background, skewing, warping, etc), which I don't think is necessary for me to include. I've done nothing to the original image, apart from a simple Otsu threshold.
Each image is a (60,60,1) numpy array. The picture attached above is a screengrab of matplotlibs output.
The output I get from EasyOCR, compared to the validation set is:
['O', 'O', 'R', 'E', 'A', 'D', 'O', 'J', 'R', 'I', 'J', 'A', 'E', 'N', 'N', 'I', 'I', 'D', 'S', 'H', 'U', 'E', 'T', 'O', 'T', 'T', 'H', 'E', 'R', 'A', 'N']
[[], [], 'r', 'E', [], [], [], [], 'R', [], [], [], 'E', 'N', 'N', [], [], [], 'S', 'H', [], 'E', [], [], [], [], 'H', 'E', 'R', [], 'N', []]
12/31
It's escaping me as to why this is so hard for it to read. I've tried resizing the image in case the character is too small. I've tried blurring it in case there's too much noise. I've tried eroding and then blurring it. But I cannot pass 20/30, no matter what I do. Any insights? It seems to me that the majority of the time, it doesn't detect a character, but when it does, it classifies it correctly. Could there be a reason for this behavior?
Each character image is 60x60, so you should be able to extract them using np slicing from the larger image. I've tried Tesseract as well, but the results were worse.
MRE:
The input to this function is a list of np arrays, representing the images below:
Using the code below, these images are turned into what I've added above.
letter = np.asarray(tile, np.uint8)
gray = cv2.cvtColor(letter, cv2.COLOR_BGR2GRAY)
_, binary_img = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
# contours, hier = cv2.findContours(binary_img, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
(h, w) = letter.shape[:2]
image_size = h * w
mser = cv2.MSER_create()
mser.setMaxArea(image_size // 6)
mser.setMinArea(90)
regions, rects = mser.detectRegions(binary_img)
good_rects = []
thresh = 14
# filtering for centre
for (x, y, width, height) in rects:
if (abs(20 - (x + width // 2)) >= thresh) or (abs(20 - (y + height // 2)) >= (thresh - 8)):
continue
good_rects.append([x, y, width, height])
# sorting according to distance from centre, in case of duplicates.
good_rects = sorted(good_rects, key=lambda sublist: abs(w // 2 - sublist[2]))
centre_rect = None
if len(good_rects) > 0:
for k in range(len(good_rects)):
if abs(good_rects[k][3] - h) <= 5:
continue
if centre_rect is None:
centre_rect = good_rects[k]
continue
if good_rects[k][3] > centre_rect[3] and abs(good_rects[k][0] - centre_rect[0]) <=3:
centre_rect = good_rects[k]
break
(x, y, width, height) = centre_rect
cv2.rectangle(letter, (x, y), (x + width, y + height), color=(255, 0, 255), thickness=1)
# taking the subsection that is the character:
padded_char = letter
character = letter
if centre_rect is not None:
(x, y, width, height) = centre_rect
character = binary_img[y:y + height, x:x + height]
new_image_width = 60
new_image_height = 60
padded_char = np.ones((new_image_height, new_image_width), dtype=np.uint8) * 255
x_center = (new_image_width - len(character[0])) // 2
y_center = (new_image_height - len(character)) // 2
padded_char[y_center:y_center + len(character), x_center:x_center + len(character[0])] = character
kernel = np.ones((1, 1), np.uint8)
# found_letter = pytesseract.image_to_string(eroded, config='--psm 10 --oem 1 -c '
# 'tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ')
padded_char = cv2.erode(padded_char, kernel, iterations=3)
changed_letters.append(padded_char)
result = reader.readtext(padded_char)