5

I am using scipy.cluster.hierarchy.linkage as a clustering algorithm and pass the result linkage matrix to scipy.cluster.hierarchy.fcluster, to get the flattened clusters, for various thresholds.

I would like to calculate the Silhouette score of the results and compare them to choose the best threshold and prefer not to implement it on my own but use scikit-learn's sklearn.metrics.silhouette_score. How can I rearrange my clustering results as an input to sklearn.metrics.silhouette_score?

J.J
  • 51
  • 1
  • 3

1 Answers1

7

You don't have to.

Results of fcluster can directly be fed into silhouette_score.

distmatrix1 = scipy.spatial.distance.squareform(distmatrix + distmatrix.T)
ddgm = scipy.cluster.hierarchy.linkage(distmatrix1, method="average")
nodes = scipy.cluster.hierarchy.fcluster(ddgm, 4, criterion="maxclust")
metrics.silhouette_score(distmatrix + distmatrix.T , nodes, metric='euclidean')
mlworker
  • 281
  • 3
  • 9
  • Just to add a little detail for anyone coming to this problem as I did, being a little confused by the addition going on in this answer: The `distmatrix + distmatrix.T` portion is just your X (the features used to generate the columns) and the `nodes` is your y (the labels of the clusters). You can just reference these directly out of your dataframe, rather than as separate objects. – WJTownsend Jan 17 '23 at 00:17