I am using Mini Batch KMeans
from sklearn
to cluster data.
The parameters were as follows:
from sklearn.cluster import MiniBatchKMeans
clustering = MiniBatchKMeans(n_clusters=100, max_iter=100, random_state=0, reassignment_ratio = 0, batch_size=100000, compute_labels=False, verbose=True)
However, although i specified that i wanted the number of clusters to be 100
, the number of clusters that i got was only 10
. Also, These clusters were unbalanced:
0 3205211
95 107525
94 76906
97 45161
96 44912
92 11628
93 8914
98 6889
91 3509
90 101
What did i miss? What parameters do i need to fix in order to get the desired cluster size and count?
I found something similar on stackoverflow KMeans clustering unbalanced data but i didn't know how to make use of it.