-2

I am trying to apply k-means (or other algorithms) clustering on some data. I want the silhouette score of the clustering results become good and at the same time, I prefer to less number of clusters. So I am wondering how can I jointly evaluate the number of clusters with silhouette score (or other metrics).

For example, the clustering model got these results below:

  • size = 2: score = 0.534

  • size = 7: score = 0.617

  • size = 20: score = 0.689

I think that the model with clustering size of 7 is the best comparing with others. Although the score of the last model is the best, the number of clusters is too many. I had try to divide the silhouette score with cluster size but it seems too trivial.

Ruriko
  • 1
  • welcome to stackoverflow, this is not a programmic question, it is more an epic decision how to discuss if kmeans fits to your idea, i am voting to close this question, sorry – PV8 Sep 12 '19 at 06:54

1 Answers1

0

Don't hack. Do it properly.

That means defining mathematically what is "good" in your personal opinion (and of course why the proposed equations capture this well). Then use this evaluation measure, but be prepared that others may disagree on your take that many clusters are bad.

Yes. Silhouette divided by the number of clusters is not a good idea. In particular, it is not a very theoretically well founded model, is it?

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194