I am looking for a clustering dataset with "ground truth" labels for some known natural clustering, preferably with high dimensionality.
I found some good candidates here (http://cs.joensuu.fi/sipu/datasets/), but only the Glass and Iris data-sets have labels for the points. I also found some code to generate Gaussian datasets (SynDECA). The main reason I want this is to compare distance metrics for some clustering methods. It's difficult to use external (extrinsic) evaluation criteria as many of those are biased towards euclidean distances; and there are so many to choose from.
Thanks!