0

I am using Mini Batch KMeans from sklearn to cluster data. The parameters were as follows:

from sklearn.cluster import MiniBatchKMeans
clustering = MiniBatchKMeans(n_clusters=100, max_iter=100, random_state=0, reassignment_ratio = 0, batch_size=100000, compute_labels=False, verbose=True)

However, although i specified that i wanted the number of clusters to be 100, the number of clusters that i got was only 10. Also, These clusters were unbalanced:

   0      3205211
   95     107525
   94     76906
   97     45161
   96     44912
   92     11628
   93     8914
   98     6889
   91     3509
   90     101

What did i miss? What parameters do i need to fix in order to get the desired cluster size and count?

I found something similar on stackoverflow KMeans clustering unbalanced data but i didn't know how to make use of it.

Taie
  • 1,021
  • 16
  • 29
  • please include a complete example. – warped May 17 '20 at 09:04
  • How did you get those numbers displayed? Did you print them? That looks like 100; it starts at 0 and goes to 99. Also, you can make MiniBatchKMeans deterministic by setting random_state=0. Finally, check out the link below. https://scikit-learn.org/stable/modules/generated/sklearn.cluster.MiniBatchKMeans.html – ASH May 18 '20 at 22:02

0 Answers0