I am working on a problem, where I want to automatically read the number on images as follows:
As can be seen, the images are quite challenging! Not only are these not connected lines in all cases, but also the contrast differs a lot. My first attempt was using pytesseract after some preprocessing. I also created a StackOverflow post here.
While this approach works fine on an individual image, it is not universal, as it requires too much manual information for the preprocessing. The best solution I have so far, is to iterate over some hyperparameters such as threshold value, filter size of erosion/dilation, etc. However, this is computationally expensive!
Therefore I came to believe, that the solution I am looking for must be deep-learning based. I have two ideas here:
- Using a pre-trained network on a similar task
- Splitting the input images into separate digits and train / finetune a network myself in an MNIST fashion
Regarding the first approach, I have not found something good yet. Does anyone have an idea for that?
Regarding the second approach, I would need a method first to automatically generate images of the separate digits. I guess this should also be deep-learning-based. Afterward, I could maybe achieve some good results with some data augmentation.
Does anyone have ideas? :)