2

I'd like to collapse branches of a dendrogram given a tolerance cutoff.

I'm following dendextend's collapse_branch example.

require(dendextend)
dend <- iris[1:5,-5] %>% dist %>% hclust %>% as.dendrogram 
dend %>% ladderize %>% plot(horiz = TRUE); abline(v = .2, col = 2, lty = 2)

enter image description here

Unlike the dendrograms in dendextend's example, I'd like to replace all collapsed branches (i.e., any clade right to the vertical red dashed line) by a triangle, similar to how clades are presented in this figure (from this link):

enter image description here

If this is too much to ask I'd settle for cutting the branches at the tolerance cutoff.

dan
  • 6,048
  • 10
  • 57
  • 125

2 Answers2

2

Getting triangles is indeed a bit too much, but you can color the branches. Either by height or by the number of clusters, by using color_branches:

library(dendextend)
dend <- iris[1:5,-5] %>% dist %>% hclust %>% as.dendrogram 
dend %>% color_branches(h=0.2) %>% ladderize %>% plot(horiz = TRUE); abline(v = .2, col = 2, lty = 2)
# OR
# dend %>% color_branches(k=4) %>% ladderize %>% plot(horiz = TRUE); abline(v = .2, col = 2, lty = 2)

enter image description here

You can also pick the number of clusters using find_k which uses the silhouette coefficient (which is, in this case, 2):

require(dendextend)
dend <- iris[1:5,-5] %>% dist %>% hclust %>% as.dendrogram 
find_k(dend)$k
dend %>% color_branches(k=find_k(.)$k) %>% ladderize %>% plot(horiz = TRUE); abline(v = .2, col = 2, lty = 2)

enter image description here

Tal Galili
  • 24,605
  • 44
  • 129
  • 187
  • Very nice. Is it really impossible to cut the dendrogram at the tolerance cutoff level? If one has a dendrogram with many many leaves but the number of meaningful clusters is much smaller (hence the tolerance cutoff will be very high) a cut dendrogram will be so much easier to look at than the default 'jungle' one gets by having it go all the way down to the leaves. – dan Jan 28 '17 at 20:52
1

One can use the ape package to drop.tip's:

require(ape)
require(dendextend)
require(data.tree)

dend <- iris[1:5,-5] %>% dist %>% hclust %>% as.dendrogram 
tol.level <- 0.28
dend %>% plot(horiz = TRUE); abline(v=tol.level,col="red",lty=2)

enter image description here

So our tolerance level is 0.28 and hence we want to collapse leaves (1,5) and (3,4), since the depth of their ancestral nodes are below tol.level

#convert dendrogram to data.tree
dend.dt <- as.Node(dend)

#get vector of leaves per each internal node
node.list <- lapply(dend.dt$Get(function(node) node$leaves,filterFun = isNotLeaf),function(n) unname(sapply(unlist(n,recursive = T),function(l) l$name)))
#get vector of per each internal node
node.depth.df <- data.frame(depth=c(t(sapply(Traverse(dend.dt,traversal="pre-order",pruneFun=isNotLeaf),function(x) c(x$plotHeight)))),stringsAsFactors=F)

to.drop.leave.names <- c(sapply(which(node.depth.df$depth < tol.level),function(i) node.list[[i]]))

#convert dendrogram to phylo
phylo.dend <- as.phylo(dend)
phylo.dend <- drop.tip(phylo.dend,tip=to.drop.leave.names,interactive=FALSE,trim.internal=FALSE)
plot(phylo.dend,use.edge.length=F)

enter image description here

Now we can convert it back to a dendrogram (Chronogram)

new.dend <- chronos(phylo.dend)
dan
  • 6,048
  • 10
  • 57
  • 125