Training using the custom dataset instead of MNIST

Question

I would like to use a custom dataset that contains image of handwritten characters of a different language other than English. I am planning to use the KNN algorithm of classify the handwritten characters.

Here are some of the challenges i am facing at this point of time. 1. The images are of different sizes. - How do we solve this issue, any ETL work to be done using Python? 2. Even if we assume they are of same size, the potential pixels of every image would be around 70 * 70 as the letters are complex than English with many features between characters. - How does this affect my training and the performance?

score 1 · Answer 1 · answered Jul 29 '17 at 11:14

Choose a certain size and resize all images (for example with PIL module);
I suppose that it depends on the quality of the data and on the language itself. If letters are complex (like hieroglyphs) it will be difficult. Otherwize if the letters are drawn with thin lines, they could be recognized even in little pictures.

Anyway, if the drawn letters are too similar to each other, it would be more difficult to recognize them, of course.

One interesting idea is not simply using pixels as training data, you could create some special features, as described here: http://archive.ics.uci.edu/ml/datasets/Letter+Recognition

Training using the custom dataset instead of MNIST

1 Answers1