I am trying to create an iOS App, that is able to recognize numbers from 0 to 9, when getting an image from the devices camera. I started out by detecting the number, which in my case will always be in a blue circle. I managed to get a pretty accurate circle detection with OpenCV. The app will at this point take the image, scan it for blue circles, cut it to the part, where the circle is, turn it into black and white and higher the contrast, so that there is just pure black (the background) and pure white (the number). The result is a pretty clear image of just the number. The last step would be to recognize the image with a simple Image Classifier.
So I tried to recreate such "white number on black background" - images for a dataset. I used images of the numbers with the same font, as in reality, added random contrast, random brightness, random scale, added a blue circle and gave it to the function in OpenCV, which then gave back an image, that I saved on my hard drive. The dataset I created had over 10.000 images per number (so over 100.000 in total). I then used CreateML to train an Image Classifier for that dataset. The accuracy in the actual app with actual photos of such numbers however is quite bad.
So I tried a different approach. The idea was to change everything of the images except the number, so that the model learns the similarities. I did this by adding random white and black pixels to the image, rotating and scaling it. In the end I applied the same black and white filter from OpenCV and saved the images on my hard drive. This model is even worse than the above.
You can find a sample image of both datasets here: https://1drv.ms/f/s!Ao1FRfDXc7vKklCxq3n7NC6APImP
So here are my Questions:
1) Shouldn't it be quite easy to create a Machine-Learning Model that is capable of recognizing the numbers with a high accuracy?
2) What should my dataset look like in this case, to optimize the model accuracy?
3) How many images per number would you recommend for training?