2

I want to test my clustering algorithms on data of handwritten text, so I'm searching for a dataset of handwritten text (e.g. words) with already extracted features (the goal is to test my clustering algorithms on, not to extract features). Does anyone have any information on that ?

Thanks.

shn
  • 5,116
  • 9
  • 34
  • 62

2 Answers2

0

There is a dataset of images of handwritten digits : http://yann.lecun.com/exdb/mnist/ .

cyborg
  • 9,989
  • 4
  • 38
  • 56
  • Yes, I've already tested on this database using the 28*28 pixels values of each image as feature vector. But I want more to have an extracted features (descriptors) from a set of handwritten words, characters, or digits ... – shn Dec 22 '11 at 14:01
0

Texmex has 128d SIFT vectors "to evaluate the quality of approximate nearest neighbors search algorithm on different kinds of data and varying database sizes", but I don't know what their images are of; you could try asking the authors.

denis
  • 21,378
  • 10
  • 65
  • 88
  • The dataset corpus-texmex is intended for the evaluation of approximate nearest neighbor search methods only. – shn Feb 03 '12 at 16:30