3

I am trying to use Kmean algorithm in Python using Sklearn library. My question is, that is there any way in which I can generate centriods in ascending orders. for example here is my code:

kmeanDataFrame = pd.DataFrame({'x':X,'y':Y})
kmean = KMeans(init='k-means++',n_clusters = 6,random_state=0, n_init=10)
kmean.fit(kmeanDataFrame)
print(kmean.labels_)
print(kmean.cluster_centers_)

Here X and Y are arrays, I am giving data of countries population ranking of different years. Centriods keep changing for instance when I give it 2011 it generates centriods like this:

[[ 4.22019639  2.88409457]
[ 1.15267995  0.7954897 ]
[ 2.49913831  1.64727509]
[-1.71104298 -1.54454861]
[ 6.99545873  6.08921786]
[ 0.20412018  0.0517948 ]]

and when I pass in 2012, it generates like this:

[[ 0.94596298  0.64243913]
[ 4.2710023   3.0083124 ]
[-0.27485671 -0.35197801]
[ 2.41465001  1.59198646]
[-6.514922   -4.53656495]
[ 7.77638888  7.18733868]]

Is there any way that I can generate centroids in ascending order (first negative points, then positive points) like this:

[[-1.71104298 -1.54454861],
[ 0.20412018  0.0517948 ],
[ 1.15267995  0.7954897 ],
[ 2.49913831  1.64727509],
[ 4.22019639  2.88409457],
[ 6.99545873  6.08921786]]
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • 1
    For the discussion: this post might be helpful in general but it is not for python https://stackoverflow.com/questions/17685327/get-ordered-kmeans-cluster-labels – Dr. H. Lecter Mar 01 '20 at 12:20

2 Answers2

2

Suppose you happened to have the following clustering:

from sklearn.cluster import KMeans
np.random.seed(42)
X = np.random.rand(10000)
Y = np.random.rand(10000)
kmeanDataFrame = pd.DataFrame({'x':X,'y':Y})
kmean = KMeans(init='k-means++',n_clusters = 6,random_state=0, n_init=10)
kmean.fit(kmeanDataFrame)

cc = kmean.cluster_centers_
print(cc)

[[0.14575507 0.27937172]
 [0.76783063 0.80079467]
 [0.47849743 0.14838875]
 [0.2147012  0.79923057]
 [0.48920425 0.5285314 ]
 [0.83935504 0.27354554]]

Then you can sort either along 0th column:

idx = np.argsort(cc[:,0])
cc[idx,:]
array([[0.14575507, 0.27937172],
       [0.2147012 , 0.79923057],
       [0.47849743, 0.14838875],
       [0.48920425, 0.5285314 ],
       [0.76783063, 0.80079467],
       [0.83935504, 0.27354554]])

or 1st column:

idx = np.argsort(cc[:,1])
cc[idx,:]
array([[0.47849743, 0.14838875],
       [0.83935504, 0.27354554],
       [0.14575507, 0.27937172],
       [0.48920425, 0.5285314 ],
       [0.2147012 , 0.79923057],
       [0.76783063, 0.80079467]])
Sergey Bushmanov
  • 23,310
  • 7
  • 53
  • 72
  • basically purpose is not to sort afterward, I am checking countries' population data over different years and trying to analyze which countries change their cluster position over the years. so that's why I wanted to use KMean to make centroids from negative to positive values. – Imran Ahmad Shahid Mar 01 '20 at 15:08
  • I see. On the surface you may try your luck with `init` param, where you can pass predefined centroids for cluster searching initialization. To give you a more meaningful answer what is the purpose for you doing that apart from pure aesthetics? – Sergey Bushmanov Mar 01 '20 at 15:12
  • 2
    KMeans is random by its nature. Try passing your centroids in desired order to `init` and see if the result satisfies you. – Sergey Bushmanov Mar 01 '20 at 15:26
1

After generating means, the list of means can br sorted by using sorted funtion in python. As in below line of code

sorted_Centers=sorted (kmean.cluster_centers_)

Sana
  • 71
  • 1
  • 2
  • 9
  • basically purpose is not to sort afterward, I am checking countries' population data over different years and trying to analyze which countries change their cluster position over the years. so that's why I wanted to use KMean to make centroids from negative to positive values. – Imran Ahmad Shahid Mar 01 '20 at 15:08