2

I have found this implementation of K-Medoids and I decided to try it in my code.

My original dataset is a 21x6 matrix.

To generate the distance matrix I'm using:

import scipy.spatial.distance as ssd
distanceMatrix = ssd.squareform(ssd.pdist(matr, 'cosine'))

Then I decide a number of clusters:

clusters = int(np.sqrt(len(matr.data)/2))

And finally:

clusters, medoids = self.cluster(distanceMatrix,clusters)
print(clusters)
print(medoids)

For the given input, I get this output:

[12 12 12 12 12 12 12  7  7  7  7 11 12 12 12 12 12 12 11 12 12]
[12  7 11]

While I was expecting an output similar to sklearn.cluster.KMeans where I have a label for each point in my matrix. How should I treat this kind of output if I want to use the result to scatter cluster elements, like in the picture below (where I used k-Means)? kmeans-example

Vektor88
  • 4,841
  • 11
  • 59
  • 111
  • 2
    That code is not kMedoids (PAM). It's something else. – Has QUIT--Anony-Mousse Jul 24 '15 at 13:53
  • @Anony-Mousse thank you for pointing this out, I'm not a clustering expert at all. Would this implementation be any better? http://www.researchgate.net/publication/272351873_NumPy__SciPy_Recipes_for_Data_Science_k-Medoids_Clustering – Vektor88 Jul 24 '15 at 14:37
  • Doesn't look like the algorithm either, compare to L. Kaufman and P. Rousseeuv, “Clustering by Means of Medoids,” in Statistical Data Analysis Based on the L1 Norm and Related Methods, Y. Dodge, Ed. Elsevier, 1987, pp. 405–416. – Has QUIT--Anony-Mousse Jul 24 '15 at 14:52

1 Answers1

1

The k-medoids is using datapoints as centers, so print(medoids) would give you the index of centers in your input dataset and print(clusters) would give you which group the data point belong to.
the stars in the graph would be dataset[12],dataset[11], and dataset[7]

galaxyan
  • 5,944
  • 2
  • 19
  • 43