-1

My goal is to detect digits from 0 to 9 on a random background. I wrote a dataset generator with the following features:

  • Grayscale data
  • Random digit rotation
  • Random digit blur
  • 43 different fonts
  • Random noisy blurred background

Here are 1024 samples of my dataset: 1024 testset samples

I adapted the mnist expert model to train the dataset and get almost 100% on the train and validation set.

On the test set I get approximately 80% correct. Here is a sample. The green digit is the digit predicted:

9 predicted as 5

It seems that my model has some troubles to distinguish between

  • 1 and 7

  • 8 and 3

  • 9 and 6

  • 5 and 9

I need to detect the digit on any background because the test images are not always binary images.

Now my questions:

For the testset generator:

  • How useful is applying digit rotation? When I rotate a 7 then I get a 1 for some fonts. When I rotate a 9 I get a 6 (rotation > 90°)

  • Is the convolution filter already treating image rotation?

  • Are 180'000 image samples enough to train the model?

For the model:

  • Should I increase the image size from 28x28 to 56x56 when I apply a blur filter onto the dataset?

  • What filter size should I use?

  • Do I have to increase the number of hidden layers?

Thanks a lot for any guide.

Tobias Ernst
  • 4,214
  • 1
  • 32
  • 30
  • Something you didn't ask which is really the crux of the problem **"how can I achive better than 80% on the test set?"**. If that was your question I'd be interested in answering. Basically, you need regularization. You stated that you are able to get 100% on the training set but only 80% on training. That is the clearest indication of overfitting and a lack of regularization which needs to be fixed first. – Anton Codes Jun 09 '17 at 14:26
  • Thanks for your answer. I use dropout with a dropout rate of 0.05. The result is slightly better now but I still have troubles on the testset: 8 is recognized as 3 or 0. What can I do? – Tobias Ernst Jun 10 '17 at 08:15

2 Answers2

0

If you are stuck with the different image backgrounds, I suggest you try image filtering, which will turn your images into the same background for foreground, assuming your images have good qualities.

Try this (scikit-image library):

import numpy as np

from skimage import filters as flt

filtered_image = np.array(original_image > flt.threshold_li(original_image))

Then you can use the filtered images for both training and prediction.

Bo Shao
  • 143
  • 6
0

I ended up extracting the dataset patches out of existing images instead of using a random background with random digits. This gives us less variance and a much better accuracy on the test set.

Here is a working but not so performant implementation which allows us to define shape and stride size:

def patchify(self, arr, shape, stride):
    patches = []
    arr_shape = arr.shape
    (shape_h, shape_w) = shape
    (stride_h, stride_w) = stride
    num_patches = np.floor(np.array(arr_shape)/np.array(stride))
    (num_patches_row, num_patches_col) = (int(num_patches[0]), int(num_patches[1]))

    for row in range(num_patches_row):
        row_from = row*stride_h
        row_to = row_from+shape_h

        for col in range(num_patches_col):
            col_from = col * stride_w
            col_to = col_from + shape_w

            origin_information = (row_from,row_to, col_from,col_to)
            roi = arr[row_from:row_to, col_from:col_to]
            patches.append((roi, origin_information))
    return patches

or we can also use scklearn where image is a numpy array

patches = image.extract_patches_2d(image, (patch_height, patch_width))
Tobias Ernst
  • 4,214
  • 1
  • 32
  • 30