Preprocessing an image for MNIST OCR

Question

I'm busy with an OCR application in python to read digits. I'm using OpenCV to find the contours on an image, crop it, and then preprocess the image to 28x28 for the MNIST dataset. My images are not square, so I seem to lose a lot of quality when I resize the image. Any tips or suggestions I could try?

This is the original image

This is after editing it

And this is the quality it should be

I've tried some tricks from http://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html , like Dilation and Opening. But it doesnt make it better, it only makes it vague...

This it the code im using (find contour,crop it, resize it, then threshold, and then i center it)

import numpy as np
import cv2
import imutils
import scipy
from imutils.perspective import four_point_transform
from scipy import ndimage

images = np.zeros((4, 784))
correct_vals = np.zeros((4, 10))

i = 0


def getBestShift(img):
    cy, cx = ndimage.measurements.center_of_mass(img)

    rows, cols = img.shape
    shiftx = np.round(cols / 2.0 - cx).astype(int)
    shifty = np.round(rows / 2.0 - cy).astype(int)

    return shiftx, shifty


def shift(img, sx, sy):
    rows, cols = img.shape
    M = np.float32([[1, 0, sx], [0, 1, sy]])
    shifted = cv2.warpAffine(img, M, (cols, rows))
    return shifted


for no in [1, 3, 4, 5]:
    image = cv2.imread("images/" + str(no) + ".jpg")
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)
    edged = cv2.Canny(blurred, 50, 200, 255)

    cnts = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL,
                            cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if imutils.is_cv2() else cnts[1]
    cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
    displayCnt = None

    for c in cnts:
        # approximate the contour
        peri = cv2.arcLength(c, True)
        approx = cv2.approxPolyDP(c, 0.02 * peri, True)

        # if the contour has four vertices, then we have found
        # the thermostat display
        if len(approx) == 4:
            displayCnt = approx
            break

    warped = four_point_transform(gray, displayCnt.reshape(4, 2))
    gray = cv2.resize(255 - warped, (28, 28))
    (thresh, gray) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY |     cv2.THRESH_OTSU)


    while np.sum(gray[0]) == 0:
        gray = gray[1:]

    while np.sum(gray[:, 0]) == 0:
        gray = np.delete(gray, 0, 1)

    while np.sum(gray[-1]) == 0:
        gray = gray[:-1]

    while np.sum(gray[:, -1]) == 0:
        gray = np.delete(gray, -1, 1)

    rows, cols = gray.shape

    if rows > cols:
        factor = 20.0 / rows
        rows = 20
        cols = int(round(cols * factor))
        gray = cv2.resize(gray, (cols, rows))

    else:
        factor = 20.0 / cols
        cols = 20
        rows = int(round(rows * factor))
        gray = cv2.resize(gray, (cols, rows))

    colsPadding = (int(np.math.ceil((28 - cols) / 2.0)), int(np.math.floor((28 - cols) / 2.0)))
    rowsPadding = (int(np.math.ceil((28 - rows) / 2.0)), int(np.math.floor((28 - rows) / 2.0)))
    gray = np.lib.pad(gray, (rowsPadding, colsPadding), 'constant')

    shiftx, shifty = getBestShift(gray)
    shifted = shift(gray, shiftx, shifty)
    gray = shifted

    cv2.imwrite("processed/" + str(no) + ".png", gray)
    cv2.imshow("imgs", gray)
    cv2.waitKey(0)

Rather than the not being square, It seems to me that the issue lies in the ratio of thickness of the lines relative to the width/height of the image. E.g. the line is roughly 8 pixels wide -- that's ~1/25 of the width of the digit, or ~1/70 of the width of the white rectangular area. This will cause the resized symbols to look very faint. I'd try to "fatten" them before scaling it down. Also, the Otsu threshold might be doing you a disservice there -- try picking a good threshold manually, and see if that makes things better. — Dan Mašek, Apr 12 '18 at 14:07
@DanMašek, the Otsu thresholding was doing me a disservice yeah, it's a little better without, but still not well enough. Any idea how i can add a black square behind it, so I can resize it a bit better? — Casper, Apr 12 '18 at 14:29

Zev · Accepted Answer · 2018-04-12T15:00:19.443

When you resize the image, make sure you select the interpolation that best suits your needs. For this, I recommend:

gray = cv2.resize(255 - warped, (28, 28), interpolation=cv2.INTER_AREA)

which results in after the rest of your processing.

You can see a comparison of methods here: http://tanbakuchi.com/posts/comparison-of-openv-interpolation-algorithms/ but since there's just a handful, you can try them all out and see what gives the best results. It looks like the default is INTER_LINEAR.

Preprocessing an image for MNIST OCR

1 Answers1