Determining the number of clusters for kdd99 dataset using k-means

Question

What is the general convention for number of k, while performing k-means on KDD99 dataset? Three different papers I read have three completely different k (25,20 and 5). I would like to know the general opinion on this, like what should be the range of k e.t.c?

Thanks

score 0 · Answer 1 · answered May 21 '19 at 20:08

The K-means clustering algorithm is used to find groups which have not been explicitly labeled in the data. I general there is no method for determining the exact value for K, but an estimated approach can be used to determine it.

To find K, take the mean distance between data points and their cluster centroid.
The elbow method and kernel method works more precisely, but the number of clusters can depend upon your problem. (Recommended) And one of the quick approaches is:-Take the square root of the number of data points divided by two and set that as number of cluster.

Determining the number of clusters for kdd99 dataset using k-means

1 Answers1