2

I want to create a data frame that has my row names, their order according to hclust, and their group membership according to cutree. I'm having a hard time reconciling the outputs from these functions.

d <- dist(USArrests)
hc <- hclust(d)
pheatmap::pheatmap(as.matrix(d)[hc$order, hc$order], cluster_cols = F, cluster_rows = F) # what I expect to see

data.frame(
  state = hc$labels,
  hc = cutree(hc, h = 0), # hc doesn't match heat map
  family = cutree(hc, k = 5) # family order doesn't agree with hc order
)

Based on my understanding of hierarchical clustering, hc and family should both be in the same order. For example:

hc     = 1 2 3 4 5 6 # smallest to greatest
family = 1 1 2 2 2 3 # smallest to greatest

Update:

This gets me the result I'm looking for, I want to see if there's a simpler way to do this

data.frame(state = hc$labels[hc$order],
  row = 1:length(hc$labels),
  family = cutree(hc, k = 5)[hc$order]
)
Jeff Bezos
  • 1,929
  • 13
  • 23

0 Answers0