0

I want to conduct a hierarchical clustering and plotting a classic dendrogram with a heatmap. This is reasonably easy using heatmap.2 or heatmap.3 in R, and seem reasonable easy in python as well. However, what I'm not really finding a nice solution for is annotation of the tree.

Ideally, I'd like to color code my branches according to meta data. Say that I have ~ 10k rows of 5 different types, after the clustering I'd like to visualize how these types group together. It's not really feasible to label each row due to the amount of data.

It doesn't seem impossible to color the tree based on cluster/distance, but that's not really what I want.

The classifying vector for color could either be a separate column or a part of the rownames

Solutions in R och Python doesn't really matter. Thanks!

Edit:

Example:

library(gplots)
library(proxy)
df = data.frame(matrix(rnorm(100), nrow=10))
rownames(df) <- c("A_1","A_2","A_3","B_1","B_2","B_3","C_1","C_2","C_3","C_4")
df <- t(df)
distance.matrix.df <- dist(as.matrix(df), method='pearson')
clust.df1 <- hclust(distance.matrix.df, method = "average")
dend.dfc <- as.dendrogram(clust.df1)
heatmap.2(as.matrix(df), Rowv=dend.dfc, keysize=1, dendrogram="col", trace="none")

Output: Here

Desired output: Here

Tal Galili
  • 24,605
  • 44
  • 129
  • 187
Myggan
  • 53
  • 5

1 Answers1

1

In R you could try it like this:

library(dendextend)
dend <- df %>% t %>% dist %>% hclust %>% as.dendrogram %>% 
  branches_attr_by_clusters(as.numeric(as.factor(substr(labels(.), 0, 1))), 
                            attr="col")
heatmap.2(as.matrix(df), Rowv=dend.dfc, Colv=dend, keysize=1, 
          dendrogram="col", trace="none")

which gives you something like this:

enter image description here

lukeA
  • 53,097
  • 5
  • 97
  • 100
  • Thanks! That seems to do the trick! Now I just have to understand, specially "df %>% t %>% dist %>% hclust %>% as.dendrogram %>% " is a bit of an enigma to me. – Myggan Oct 31 '15 at 13:37
  • @Myggan This just creates the distance matrix, hclust object and dendrogram in one line - you can read more about the pipe operator `%>%` [here](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html). – lukeA Oct 31 '15 at 18:01
  • In addition to the solution described, It might also work to use the "ColSideColors" parameter in heatmap.2 – Tal Galili Nov 01 '15 at 07:25
  • 1
    @lukeA Thanks again, it make a lot more sense now when I realized it was a pipe (though a very verbose pipe one would think) :) – Myggan Nov 01 '15 at 12:46