0

I need to detect single characters as part of a bigger project, but seem to have hit a roadblock.

This is what I have to work with: Characters

I've already done a bit of preprocessing to get to this stage (removing the background, skewing, warping, etc), which I don't think is necessary for me to include. I've done nothing to the original image, apart from a simple Otsu threshold.

Each image is a (60,60,1) numpy array. The picture attached above is a screengrab of matplotlibs output.

The output I get from EasyOCR, compared to the validation set is:

['O', 'O', 'R', 'E', 'A', 'D', 'O', 'J', 'R', 'I', 'J', 'A', 'E', 'N', 'N', 'I', 'I', 'D', 'S', 'H', 'U', 'E', 'T', 'O', 'T', 'T', 'H', 'E', 'R', 'A', 'N']

[[], [], 'r', 'E', [], [], [], [], 'R', [], [], [], 'E', 'N', 'N', [], [], [], 'S', 'H', [], 'E', [], [], [], [], 'H', 'E', 'R', [], 'N', []]
12/31

It's escaping me as to why this is so hard for it to read. I've tried resizing the image in case the character is too small. I've tried blurring it in case there's too much noise. I've tried eroding and then blurring it. But I cannot pass 20/30, no matter what I do. Any insights? It seems to me that the majority of the time, it doesn't detect a character, but when it does, it classifies it correctly. Could there be a reason for this behavior?

Each character image is 60x60, so you should be able to extract them using np slicing from the larger image. I've tried Tesseract as well, but the results were worse.

MRE:

The input to this function is a list of np arrays, representing the images below: raw input

Using the code below, these images are turned into what I've added above.

 letter = np.asarray(tile, np.uint8)
                gray = cv2.cvtColor(letter, cv2.COLOR_BGR2GRAY)
                _, binary_img = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
                # contours, hier = cv2.findContours(binary_img, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
                (h, w) = letter.shape[:2]
                image_size = h * w
                mser = cv2.MSER_create()
                mser.setMaxArea(image_size // 6)
                mser.setMinArea(90)

                regions, rects = mser.detectRegions(binary_img)
                good_rects = []
                thresh = 14
                # filtering for centre
                for (x, y, width, height) in rects:
                    if (abs(20 - (x + width // 2)) >= thresh) or (abs(20 - (y + height // 2)) >= (thresh - 8)):
                        continue
                    good_rects.append([x, y, width, height])
                # sorting according to distance from centre, in case of duplicates.
                good_rects = sorted(good_rects, key=lambda sublist: abs(w // 2 - sublist[2]))
                centre_rect = None
                if len(good_rects) > 0:
                    for k in range(len(good_rects)):
                        if abs(good_rects[k][3] - h) <= 5:
                            continue
                        if centre_rect is None:
                            centre_rect = good_rects[k]
                            continue
                        if good_rects[k][3] > centre_rect[3] and abs(good_rects[k][0] - centre_rect[0]) <=3:
                            centre_rect = good_rects[k]
                            break

                    (x, y, width, height) = centre_rect
                    cv2.rectangle(letter, (x, y), (x + width, y + height), color=(255, 0, 255), thickness=1)




                # taking the subsection that is the character:

                padded_char = letter
                character = letter
                if centre_rect is not None:
                    (x, y, width, height) = centre_rect
                    character = binary_img[y:y + height, x:x + height]
                    new_image_width = 60
                    new_image_height = 60
                    padded_char = np.ones((new_image_height, new_image_width), dtype=np.uint8) * 255

                    x_center = (new_image_width - len(character[0])) // 2
                    y_center = (new_image_height - len(character)) // 2

                    padded_char[y_center:y_center + len(character), x_center:x_center + len(character[0])] = character


                kernel = np.ones((1, 1), np.uint8)


                # found_letter = pytesseract.image_to_string(eroded, config='--psm 10 --oem 1 -c '
                # 'tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ')

                padded_char = cv2.erode(padded_char, kernel, iterations=3)
                changed_letters.append(padded_char)
                result = reader.readtext(padded_char)
  • 1
    Show your tesseract code. Perhaps you need to adjust one of the parameters. See page segmentation modes at https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md – fmw42 Jul 18 '23 at 01:01
  • @fmw42 As stated in the title, I'm using EasyOCR. When I tried Tesseract, I used psm 10, which seemed to be ideal. – Armaan Shah Jul 18 '23 at 21:43
  • Sorry, I do not know EasyOCR. But concepts in that reference might still be useful. – fmw42 Jul 18 '23 at 22:47
  • @fmw42 I've done all the steps in the reference as part of my initial preprocessing. Do you have any other insight as to why this text may not be readable? It isn't readable to a high degree of accuracy by tesseract either. – Armaan Shah Jul 19 '23 at 10:10
  • tesseract should read it if you use the arguments as suggested for individual characters. – fmw42 Jul 19 '23 at 15:53
  • @ArmaanShah please provide a [MWE](https://stackoverflow.com/help/minimal-reproducible-example) including the code that you have tried and the corresponding parameters. – Bilal Jul 19 '23 at 16:25
  • @Bilal I've edited the post, I hope this is enough. – Armaan Shah Jul 22 '23 at 08:06

0 Answers0