I am working on a university project to detect letters from a photo. I can successfully extract words from the photo, cut them into single letters which are black an a white background. These pictures look quite clear.
I have trained the SVC classificator from the Python scikit library as follows:
classifier = svm.SVC(gamma=0.001)
It has been trained on about 800 letters which where obtained by me from words using my scripts. The classifier predicts letters very well when it works on letters on which it was trained. However, when I provide a new letter obtained with the same script from a different word, it fails every single time. Old and new examples seems to look very similar.
Can you give me any tips on how to improve this situation?
I have also trained this classsificator on 26k letters from the ready-made subset available online. The result was the same - perfect on training data, fail on new data.