0

I have 500x500px images, each one with one or more handwritten digits draw on a TKinter Canvas and save to a PNG file (example).

Each of these digits is ultimately given to a digit recognizer neural network that accepts 28x28px images (for that I use tensorflow).

If the image has only one digit (example), I do not need to segment the digits. The image is resized to 28x28px and fed to the digit recognizer (example after resizing). In this case, I have an excellent recognizing accuracy.

My problem occurs when the image has more than one digit. In this case, I segment the individual digits and save each of them in a separated image file. Each of these images is resized and fed to the digit recognizer, but in this case I have an awful accuracy.

This fall in accuracy is due to differences in the features of the resized segmented images and the features of the images I trained the neural network with, such as the margin/padding of the image and the thickness of the digit itself.

When I segment a digit and save it to an individual image file, this new file has no padding/margins (example). After resizing it to 28x28px, I get a digit that are much thicker than it should be and yet with no padding/margins (example). It seems a little distorted. This is different from the digit images that I used to train the neural network. That's why my accuracy in predicting digits is low.

To fix that, I want to preserve the padding/margins and the thickness with the segmented digits in the final 28x28px image that will be fed to the digit recognizer. My idea is, after saving the segmented image (and before reshaping it), to put it in the center of a 500x500px white square. So this segmented digit would be almost identical to the case in which I had only 1 written digit in the original image and, after reshaping to 28x28, I would preserve the margins and the thickness. How can I implement this idea?

The code that I use to segment the digits (credits to Devashish Prasad):

# import the necessary packages
import numpy as np
import cv2
import imutils

# load the image, convert it to grayscale, and blur it to remove noise
image = cv2.imread("sample1.jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (7, 7), 0)

# threshold the image
ret,thresh1 = cv2.threshold(gray ,127,255,cv2.THRESH_BINARY_INV)

# dilate the white portions
dilate = cv2.dilate(thresh1, None, iterations=2)

# find contours in the image
cnts = cv2.findContours(dilate.copy(), cv2.RETR_EXTERNAL,
    cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if imutils.is_cv2() else cnts[1]

orig = image.copy()
i = 0

for cnt in cnts:
    # Check the area of contour, if it is very small ignore it
    if(cv2.contourArea(cnt) < 100):
        continue

    # Filtered countours are detected
    x,y,w,h = cv2.boundingRect(cnt)

    # Taking ROI of the cotour
    roi = image[y:y+h, x:x+w]

    # Mark them on the image if you want
    cv2.rectangle(orig,(x,y),(x+w,y+h),(0,255,0),2)

    # Save your contours or characters
    cv2.imwrite("roi" + str(i) + ".png", roi)

    i = i + 1

cv2.imshow("Image", orig)
cv2.waitKey(0)
mvww11
  • 33
  • 4

1 Answers1

0

I have needed a similar thing in one of my projects. Instead of pasting the cropped image to the center of 500 x 500 white background, I have filled the surrounding of the image until if gets to 500 x 500 in size like this:

ih, iw = 500, 500 # <-- The desired output height and width
h, w = image.shape

out_image = cv2.copyMakeBorder(image, int((ih - h / 2)), int((ih - h / 2)), int((iw - w / 2)), int((iw - w / 2)), cv2.BORDER_CONSTANT, value=255)

For me it seemed easier. Hope it helps.

yilmazdoga
  • 400
  • 1
  • 6
  • 10