6

I am trying to find dendrogram a dataframe created using PANDAS package in python. An example data is shown below.

import numpy as np
from pandas import *
import matplotlib.pyplot as plt
from hcluster import pdist, linkage, dendrogram
from numpy.random import rand

Index= ['aaa','bbb','ccc','ddd','eee']
Cols = ['A', 'B', 'C','D']
df = DataFrame(abs(np.random.randn(5, 4)), index= Index, columns=Cols)


>>> df
            A         B         C         D
aaa  0.987415  0.192240  0.709559  0.317106
bbb  0.856932  0.252441  1.183127  0.712855
ccc  1.687198  0.462673  1.046469  0.159287
ddd  0.977152  2.657582  0.491975  0.027280
eee  0.120464  0.945034  0.142658  0.537024
>>> 

X = df.T.values #Transpose values 
Y = pdist(X)
Z = linkage(Y)
dendrogram(Z)

The above code generate the dendrogram but misses the column names. How can I keep track of the same.

Curious
  • 3,507
  • 8
  • 28
  • 30
  • 1
    The column names are gone when you only use 'values'. I never used dendogram, but after a quick scan through its documentation i would try: dendogram(Z, labels=df.T.columns) – Wouter Overmeire Sep 21 '12 at 12:07
  • Thanks. Got it right. dendrogram(Z, labels = df.columns) worked for me. – Curious Sep 21 '12 at 12:58
  • 2
    In case someone will found that nova days here is Python 3.x compatible version: `from scipy.spatial.distance import pdist` `from scipy.cluster.hierarchy import linkage, dendrogram` hcluster is not updated from 2008, cluster stuff is in scikit now – mac Mar 31 '17 at 06:31

1 Answers1

5

As suggested by @Wouter Overmiere, the following worked for me.

X = df.T.values #Transpose values 
Y = pdist(X)
Z = linkage(Y)
dendrogram(Z, labels = df.columns)
Curious
  • 3,507
  • 8
  • 28
  • 30