5

I would like to extract the hierarchical structure of the nodes of a dendrogram or cluster.

For example in the next example:

library(dendextend)
dend15 <- c(1:5) %>% dist %>% hclust(method = "average") %>% as.dendrogram
dend15 %>% plot

The nodes are classified according their position in the dendrogram (see figure below)

enter image description here

(Figure extracted from the dendextend package's tutorial)

I would like to get all the nodes for each final leaf as the next output: (the labels are ordered from left to right and from bottom to top)

        hierarchical structure
leaf_1: 3-2-1
leaf_2: 4-2-1
leaf_3: 6-5-1
leaf_4: 8-7-5-1
leaf_5: 9-7-5-1

Thanks in advance,

Tal Galili
  • 24,605
  • 44
  • 129
  • 187
Ruben
  • 493
  • 4
  • 18

1 Answers1

6

First I find all subtrees (i.e structure) that uses a node. In your example, there would be 9 nodes.

subtrees <- partition_leaves(dend15)
leaves <- subtrees[[1]]   # assume top node is used by all subtrees

I make a helper function to find route for each leaf, and apply it to all leaves.

pathRoutes <- function(leaf) {
  which(sapply(subtrees, function(x) leaf %in% x))
}

paths <- lapply(leaves, pathRoutes)

The raw output in list form, where each list element is the structure for an end node / leaf

> paths
[[1]]
[1] 1 2 3

[[2]]
[1] 1 2 4

[[3]]
[1] 1 5 6

[[4]]
[1] 1 5 7 8

[[5]]
[1] 1 5 7 9
Ricky
  • 4,616
  • 6
  • 42
  • 72
  • Still not sure how to interpret these path numbers. For ex, what is '9' supposed to refer to? – horaceT Nov 14 '20 at 05:40
  • Also, what does `length(paths)` suppose to tell us? In this case, it's 5. Is that the number of "layers"? – horaceT Nov 14 '20 at 05:45