Are there any limits on k-mean in terms of k points, data dimensionality, and size of data (millions of samples )

Question

I have a dataset consists of 2 million samples. I want to use k-means to cluster this dataset into 2000 clusters. is it ok to use this number of clusters with this data size.

note: feature vector size of each sample is 1000

It can process the data. But you probably have too many outliers. How will you know if the result is good? — Has QUIT--Anony-Mousse, Mar 02 '18 at 08:08

score 0 · Answer 1 · answered Feb 27 '18 at 10:29

To predict the runtime of an algorithm, you can take a look at it's time complexity. This is a formula that relates the run time to some parameters like for instance the data points and number of clusters in k-means. Information about time complexity in k-means clustering can be found here: Computational complexity of k-means

Are there any limits on k-mean in terms of k points, data dimensionality, and size of data (millions of samples )

1 Answers1