1

I am using dendextend to work with dendrograms in R. I want to only draw nodes that sit at the highest height within each cluster, which is determined by a tree cut @ k value. I wonder, if there is a convenient way to do that?

Reproducible example below.

###### sample tree ######

library(dendextend)
library(dplyr)

dend15 <- c(1:5) %>%
   dist %>%
   hclust(method = "average") %>%
   as.dendrogram()


###### plotting with clusters based on k #######

## it is not very intuitive from available tutorials,
## but you can control which nodes are (not) displayed by
## feeding vector with NAs to set() function

node_xy <- get_nodes_xy(dend15) ## all node xy coordinates
## setting node shapes
v <- rep(NA, nrow(node_xy)) ## vector of length of number of nodes
v[c(2,5)] <- 19 ## shape 19 to nodes 2 and 5 (i know because I looked at `node_xy`)

dend15 <- dend15 %>% set("nodes_pch",v) ## set nodes shapes

dend15 %>% plot(main="sample tree with highest nodes in k")
dend15 %>% rect.dendrogram(k=2) ## draw rectangles around clusters at cut

a dendrogram cut to k=2

I want to determine the nodes that I need to draw for any dendrogram and any k. More generally, I guess, I am asking about a way to utilize information from cutree()and rect.dendrogram() with dendextend data for controlling nodes.

UPD

heights_per_k.dendrogram() function returns tree heights at different k cuts, so nodes can be selected by these heights, e.g. height @ k > node_h > height @ k+1. However, this will not include cases when a single leaf forms a cluster (at k=3 on the example above).

perechen
  • 125
  • 9

0 Answers0