I am using dendextend
to work with dendrograms in R. I want to only draw nodes that sit at the highest height within each cluster, which is determined by a tree cut @ k value. I wonder, if there is a convenient way to do that?
Reproducible example below.
###### sample tree ######
library(dendextend)
library(dplyr)
dend15 <- c(1:5) %>%
dist %>%
hclust(method = "average") %>%
as.dendrogram()
###### plotting with clusters based on k #######
## it is not very intuitive from available tutorials,
## but you can control which nodes are (not) displayed by
## feeding vector with NAs to set() function
node_xy <- get_nodes_xy(dend15) ## all node xy coordinates
## setting node shapes
v <- rep(NA, nrow(node_xy)) ## vector of length of number of nodes
v[c(2,5)] <- 19 ## shape 19 to nodes 2 and 5 (i know because I looked at `node_xy`)
dend15 <- dend15 %>% set("nodes_pch",v) ## set nodes shapes
dend15 %>% plot(main="sample tree with highest nodes in k")
dend15 %>% rect.dendrogram(k=2) ## draw rectangles around clusters at cut
I want to determine the nodes that I need to draw for any dendrogram and any k. More generally, I guess, I am asking about a way to utilize information from cutree()
and rect.dendrogram()
with dendextend
data for controlling nodes.
UPD
heights_per_k.dendrogram()
function returns tree heights at different k cuts, so nodes can be selected by these heights, e.g. height @ k > node_h > height @ k+1
. However, this will not include cases when a single leaf forms a cluster (at k=3 on the example above).