0

I use haifengl/smile and I need to get the optimal cluster number.

I am using CLARANS where I need to specify the number of clusters to create. I think maybe there is some solution to sort out for example from 2 to 10 clusters, see the best result and choose the number of clusters with the best result. How can this be done with the Elbow method?

tambovflow
  • 153
  • 1
  • 1
  • 7

3 Answers3

2

To determine the appropriate number of clusters such that elements within the cluster are similar to each other and dissimilar to elements in other groups, can be found by applying a variety of techniques like;

  • Gap Statistic- compares the total within intra-cluster variation for different values of k with their expected values under null reference distribution of the data.

  • Silhouette Method The optimal number of clusters k is the one that maximizes the average silhouette over a range of possible values for k.

  • Sum of Square method

For more details, read the sklearn documentation on this subject.

mnm
  • 1,962
  • 4
  • 19
  • 46
1

The Elbow method is not automatic.

You compute the scores for the desired range of k, plot this, and then visually try to find an "elbow" - which may or may not work.

Because x and y have no "correct" relation to each other, beware that the interpretation of the plot (and any geometric attempt to automate this) depend on the scaling of the plot and are inherently subjective. In the end, the entire concept of an "elbow" likely is flawed and not sound in this form. I'd rather look for more advanced measures where you can argue for the maximum or minimum, although some notion of "significantly better k" would be desirable.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
0

Ways to find clusters:

1- Silhouette method:

Using separation and cohesion or just using an implemented method the optimal number of clusters is the one with the maximum silhouette coefficient. * silhouette coefficient range from [-1,1] and 1 is the best value. Example of the silhouette method with scikit-learn.

2- Elbow method (You can use the elbow method automatically)

The elbow method is a graph between the number of clusters and the average square sum of the distances. To apply it automatically in python there is a library Kneed in python to detect the knee in a graph.Kneed Repository

Ahmed Wael
  • 103
  • 2
  • 8