In section 5.A of a research paper the researcher used the following synthetic datasets:
- GAUSS consisted of six Gaussian clusters with identity covariance, each with 500 points in five dimensions. Their means were randomly assigned a value from zero to 10 in each dimension. Cluster means were required to be at least four Euclidean distance apart, and points were required to within two Euclidean distance of their cluster mean.
PAIRED consisted of three pairs of Gaussian clusters with identity covariance, each with 500 points in five dimensions. Each pair of Gaussians was placed around a mean with a randomly assigned value in each dimension from zero to 20 such that the Euclidean distance between paired Gaussian clusters was between four and eight, and the Euclidean distance between non-paired Gaussians was at least 12. Additionally, points were required to be within two Euclidean distance of their cluster mean.
ELONG consisted of five Gaussian clusters with identity covariance, each with 300 points in five dimensions. Their means were randomly assigned a value from zero to 50 in each dimension. To create elongated clusters in different dimensions, we multiplied the values of a single, distinct dimension for each cluster by 15. Cluster means were required to be at least five Euclidean distance apart.
- UNIFORM consisted of eight clusters, each with 300 points in three dimensions. Each cluster had its points uniformly distributed in a 3x3x3 box around a randomly assigned center in a 10x10x10 cube. Cluster centers were required to be five Euclidean distance apart.
- RINGS consisted of 2 ring clusters centered around (0,0), a larger outer ring with radius 2 and a smaller inner ring of radius 1. 400 points were evenly spaced by degrees on the inner ring.
http://postimg.org/image/jo4rjztjz/
I don't have these datasets. I tried to contact the researcher but of no use.
How to create these datasets? Is there any kind of tool to create them?