0

I am writing a program that (as a part of it) automatically creates dendrograms from an input dataset. For each node/split I want to extract all the labels that are under that node and the location of that node on the dendrogram plot (for further plotting purposes). So, let's say my data looks like this:

> Ltrs <- data.frame("A" = c(3,1), "B" = c(1,1), "C" = c(2,4), "D" = c(6,6))
> dend <- as.dendrogram(hclust(dist(t(Ltrs))))
> plot(dend)

The dendrogram

Now I can extract the location of the splits/nodes:

> library(dendextend)
> nodes <- get_nodes_xy(dend)
> nodes <- nodes[nodes[,2] != 0, ]
> nodes
      [,1]     [,2]
[1,] 1.875 7.071068
[2,] 2.750 3.162278
[3,] 3.500 2.000000

Now I want to get all the labels under a node, for each node (/row from the 'nodes' variable).

This should look something like this:

$`1`
[1] "D" "C" "B" "A"

$`2`
[1] "C" "B" "A"

$`3 `
[1] "B" "A"

Can anybody help me out? Thanks in advance :)

Jurr.Ian
  • 51
  • 4
  • I find working with the dendrogram structure very confusing. It probably would be much easier to work with the `hclust` object and `cutree`. You cut e.g. loop different `k` options to get to the nodes. – JBGruber Mar 17 '18 at 13:37

2 Answers2

1

How about something like this?

library(tidyverse)
library(dendextend)
Ltrs <- data.frame("A" = c(3,1), "B" = c(1,1), "C" = c(2,4), "D" = c(6,6))
dend <- as.dendrogram(hclust(dist(t(Ltrs))))

accumulator <- list();
myleaves <- function(anode){
    if(!is.list(anode))return(attr(anode,"label"))
    accumulator[[length(accumulator)+1]] <<- (reduce(lapply(anode,myleaves),c))
}

myleaves(dend);
ret <- rev(accumulator); #generation was depth first, so root was found last.

Better test this. I am not very trustworthy. In particular, I really hope the list ret is in an order that makes sense, otherwise it's going to be a pain associating the entries with the correct nodes! Good luck.

steveLangsford
  • 646
  • 5
  • 9
  • Thanks Steve. Indeed the order of the ret list is a bit weird. I think it is the same order as in the hclust$merge variable. I will have look into it that, thanks again – Jurr.Ian Mar 19 '18 at 14:40
1

Function partition_leaves() extracts all leaf labels per each node and makes a list ordered in the same fashion as get_nodes_xy() output. With your example,

Ltrs <- data.frame("A" = c(3,1), "B" = c(1,1), "C" = c(2,4), "D" = c(6,6))
dend <- as.dendrogram(hclust(dist(t(Ltrs))))
plot(dend)

partition_leaves(dend)

yields:

[[1]]
[1] "D" "C" "A" "B"

[[2]]
[1] "D"

[[3]]
[1] "C" "A" "B"

[[4]]
[1] "C"

[[5]]
[1] "A" "B"

[[6]]
[1] "A"

[[7]]
[1] "B"

filtering list by vector length will give output similar to the desired one.

taprs
  • 81
  • 1
  • 5