1

I have a distance matrix:

array('d', [188.61516889752, 226.68716730362135, 188.96015266132167])

I would like to add labels to the matrix before performing hierarchical cluster using scipy.

I produce a UPGMA dendrogram from the distance matrix using:

from scipy.cluster.hierarchy import average, fcluster
#from scipy.spatial.distance import pdist

outDND=average(distanceMatrix)

I have tried adding the labels to the dendrogram using:

from scipy.cluster.hierarchy import average, fcluster
#from scipy.spatial.distance import pdist

outDND=average(distanceMatrix, labels=['A','B','C'])

But that does not work. I get the error:

TypeError: average() got an unexpected keyword argument 'labels'

How can I add labels to 'distanceMatrix' and have them carry through to outDND?

Jamie
  • 555
  • 3
  • 14
  • Please expand on "does not work" - Do you get an error? Are there labels other than what you expect? No labels at all? – Sarah Messer Nov 28 '22 at 20:54
  • I get the error: TypeError: average() got an unexpected keyword argument 'labels' – Jamie Nov 28 '22 at 20:59
  • The immediate TypeError is because you can't use the "labels" keyword in a call to that `average()` function. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.average.html Still looking for options in putting labels on the dendrogram – Sarah Messer Nov 28 '22 at 21:16
  • Possible duplicate / reference: https://stackoverflow.com/questions/35873273/display-cluster-labels-for-a-scipy-dendrogram – Sarah Messer Nov 28 '22 at 21:17

1 Answers1

0

It looks like you're missing a couple steps between "create the distance matrix" and "create the dendrogram".

See this other StackOverflow question for several worked examples.

In general, scipy and the underlying numpy tend not to include labels in their data structures. (Unlike, say pandas, which does track labels.). That means you're responsible for keeping separate lists of labels and figuring out the correct order & references.

The steps you'll need are:

Sarah Messer
  • 3,592
  • 1
  • 26
  • 43
  • Bummer, I was hoping there was a built-in parameter to associate the labels with their index. Thank you for being so thorough in your response. One note though; it is not necessary to use 'scipy.cluster.hierarchy.linkage()' because 'outDND=average(distanceMatrix)' performs UPGMA hierarchical clustering on a pre-computed distance matrix. – Jamie Nov 28 '22 at 23:40