9

I am doing research on machine learning. Now I want to test my algorithms with some famous datasets. Since I am a newbie in this area, I can't find other suitable datasets apart from MNIST. I thing MNIST is quite suitable for our research. Does anyone know some similar datasets with MNIST?

P.S I know another handwritten digit dataset that is often used, called USPS dataset. But I need a dataset with more training examples (typically more than 10000 and comparable to the number of training examples in MNIST), so USPS is out of my selection.

nbro
  • 15,395
  • 32
  • 113
  • 196
Nothing More
  • 873
  • 12
  • 29
  • This depends on what you want to do. MNIST is a great dataset that contains handwritten digits. Do you want to work on handwritten digits or something else (faces, handwritten letters, etc)? – Ove Mar 23 '13 at 08:43
  • You can find an already decoded version of the MNIST dataset here: http://mnist-decoded.000webhostapp.com/ – SomethingSomething Mar 31 '17 at 14:18

3 Answers3

5

The machine learning archive (http://archive.ics.uci.edu/ml/) contains quite a variety of datasets including those, like MINIST, suitable for classification e.g. (http://archive.ics.uci.edu/ml/datasets/Skin+Segmentation).

I can't say which of them would be suitable without knowing what you're trying to demonstrate with your algorithm but anything inside the UCI archive is well known.

corrin
  • 63
  • 1
  • 6
4

You can try Fashion MNIST or Kuzushiji MNIST that have very similar properties to MNIST, but a bit harder to predict. From Fashion MNIST's page:

Seriously, we are talking about replacing MNIST. Here are some good reasons:

  • MNIST is too easy. Convolutional nets can achieve 99.7% on MNIST. Classic machine learning algorithms can also achieve 97% easily. Check out our side-by-side benchmark for Fashion-MNIST vs. MNIST, and read "Most pairs of MNIST digits can be distinguished pretty well by just one pixel."
  • MNIST is overused. In this April 2017 Twitter thread, Google Brain research scientist and deep learning expert Ian Goodfellow calls for people to move away from MNIST.
  • MNIST can not represent modern CV tasks, as noted in this April 2017 Twitter thread, deep learning expert/Keras author François Chollet.
aliakbars
  • 51
  • 3
0

I know this question is old, but I hope my suggestions can still be useful. I was also looking for datasets similar to handwritten MNIST and Fashion MINIST as well. Pytorch provides several of them with documentation: KMNIST, QMNIST, USPS, SEMEION, SVHN, amongst others. Check here for the full list.

Joy
  • 75
  • 1
  • 6