I ran the K-means clustering algorithm against a set of sequence files. However, the generated result looks like this:
0 belongs to cluster 1.0: []
0 belongs to cluster 1.0: []
0 belongs to cluster 1.0: []
0 belongs to cluster 1.0: []
0 belongs to cluster 1.0: []
0 belongs to cluster 1.0: []
The program I use is borrowed from NewsKMeansClustering.java, an example given in chapter 9 of Mahout-in-Action.
Would you like to let me know why I get this type of result? Is that because of any specific parameter setting requirement or anything else?
The core clustering code in this program is
CanopyDriver.run(vectorsFolder, canopyCentroids, new EuclideanDistanceMeasure(), 250, 120, false, false);
KMeansDriver.run(conf, vectorsFolder, new Path(canopyCentroids, "clusters-0"),
clusterOutput, new TanimotoDistanceMeasure(), 0.01, 20, true, false);