4

I am trying to make a heatmap showing gene expression across 4 different groups, and I would like to cluster within each group. I have samples sorted by group across the columns. Using cluster_cols = True clusters across all groups, mixing up the order of samples from each group. How can clustering be done only within each group with pheatmap?

shilovk
  • 11,718
  • 17
  • 75
  • 74
gtresto
  • 41
  • 5

1 Answers1

0

I had similar questions recently. Since there are no such updates in recent pheatmap versions, my current solution to this is:

1.Generate orders based on the PC1 eigen values:

#data_heatmap is the data tibble/matrix used to pheatmap heatmap
eigenvalues <- svd(t(scale(t(data_heatmap))),nu=1,nv=1)$v
scaledExpr <- scale(t(data_heatmap))
averExpr <- rowMeans(scaledExpr, na.rm = TRUE)
if(cor(averExpr,eigenvalues) < 0){
      eigenvalues <- -eigenvalues
}
index_eigen <- order(eigengenes)
  1. Cluster within each group, and align with eigen orders
#s2c_f is the dataframe, with one column called "Group" with group info.
index_reorder <- c()
index_pre <- c(1:length(s2c_f$Group))
for(eachgroup in unique(s2c_f$Group)){
      index_tempEigen <- index_eigen[index_eigen %in% index_pre[s2c_f$Group == eachgroup]]
      sampleDist<-dist(t(data_heatmap[,index_tempEigen]), method="euclidean")
      sampleClust<-hclust(sampleDist, method='complete')
      index_clust <- sampleClust$order
      if(cor(index_clust,c(1:length(index_tempEigen))) < 0){
        index_clust <- rev(index_clust)
      }
      index_reorder <- c(index_reorder,index_tempEigen[index_clust])
}
  1. Send new parameters to pheatmap with cluster=FALSE
s2c_f <- s2c_f[index_reorder,]
data_heatmap <- data_heatmap[,s2c_f$Sample]

ann_colors = list(Group = c(unique(s2c_f$Color)))
names(ann_colors[[1]]) = unique(s2c_f$Group)
df <- as.data.frame(s2c_f[,"Group",drop=FALSE])

pheatmap(data_heatmap, 
         scale='row',
         color = colorRampPalette(c("navy", "white", "firebrick3"))(50),
         show_rownames=TRUE,
         cluster_cols=FALSE, 
         cluster_rows=TRUE, 
         annotation_colors=ann_colors[1],
         annotation_col=df,
         gaps_row = NULL, gaps_col = NULL,
         silent=TRUE)


I think all above could be easily wrapped in a function. In the example above, I only showed how to do this when you want to cluster columns within groups, and my columns are sample names.

Another potential solution for this is ComplexHeatmap.

Raymond
  • 41
  • 5