How can I achieve better than 80% on the test set

Question

My goal is to detect digits from 0 to 9 on a random background. I wrote a dataset generator with the following features:

Grayscale data
Random digit rotation
Random digit blur
43 different fonts
Random noisy blurred background

Here are 1024 samples of my dataset: 1024 testset samples

I adapted the mnist expert model to train the dataset and get almost 100% on the train and validation set.

On the test set I get approximately 80% correct. Here is a sample. The green digit is the digit predicted:

9 predicted as 5

It seems that my model has some troubles to distinguish between

1 and 7
8 and 3
9 and 6
5 and 9

I need to detect the digit on any background because the test images are not always binary images.

Now my questions:

For the testset generator:

How useful is applying digit rotation? When I rotate a 7 then I get a 1 for some fonts. When I rotate a 9 I get a 6 (rotation > 90°)
Is the convolution filter already treating image rotation?
Are 180'000 image samples enough to train the model?

For the model:

Should I increase the image size from 28x28 to 56x56 when I apply a blur filter onto the dataset?
What filter size should I use?
Do I have to increase the number of hidden layers?

Thanks a lot for any guide.

Something you didn't ask which is really the crux of the problem **"how can I achive better than 80% on the test set?"**. If that was your question I'd be interested in answering. Basically, you need regularization. You stated that you are able to get 100% on the training set but only 80% on training. That is the clearest indication of overfitting and a lack of regularization which needs to be fixed first. — Anton Codes, Jun 09 '17 at 14:26
Thanks for your answer. I use dropout with a dropout rate of 0.05. The result is slightly better now but I still have troubles on the testset: 8 is recognized as 3 or 0. What can I do? — Tobias Ernst, Jun 10 '17 at 08:15

score 0 · Answer 1 · answered Jun 10 '17 at 19:35

If you are stuck with the different image backgrounds, I suggest you try image filtering, which will turn your images into the same background for foreground, assuming your images have good qualities.

Try this (scikit-image library):

import numpy as np

from skimage import filters as flt

filtered_image = np.array(original_image > flt.threshold_li(original_image))

Then you can use the filtered images for both training and prediction.

Tobias Ernst · Accepted Answer · 2017-06-22T09:59:04.513

I ended up extracting the dataset patches out of existing images instead of using a random background with random digits. This gives us less variance and a much better accuracy on the test set.

Here is a working but not so performant implementation which allows us to define shape and stride size:

def patchify(self, arr, shape, stride):
    patches = []
    arr_shape = arr.shape
    (shape_h, shape_w) = shape
    (stride_h, stride_w) = stride
    num_patches = np.floor(np.array(arr_shape)/np.array(stride))
    (num_patches_row, num_patches_col) = (int(num_patches[0]), int(num_patches[1]))

    for row in range(num_patches_row):
        row_from = row*stride_h
        row_to = row_from+shape_h

        for col in range(num_patches_col):
            col_from = col * stride_w
            col_to = col_from + shape_w

            origin_information = (row_from,row_to, col_from,col_to)
            roi = arr[row_from:row_to, col_from:col_to]
            patches.append((roi, origin_information))
    return patches

or we can also use scklearn where image is a numpy array

patches = image.extract_patches_2d(image, (patch_height, patch_width))

How can I achieve better than 80% on the test set

2 Answers2