I want to create a data frame that has my row names, their order according to hclust
, and their group membership according to cutree
. I'm having a hard time reconciling the outputs from these functions.
d <- dist(USArrests)
hc <- hclust(d)
pheatmap::pheatmap(as.matrix(d)[hc$order, hc$order], cluster_cols = F, cluster_rows = F) # what I expect to see
data.frame(
state = hc$labels,
hc = cutree(hc, h = 0), # hc doesn't match heat map
family = cutree(hc, k = 5) # family order doesn't agree with hc order
)
Based on my understanding of hierarchical clustering, hc
and family
should both be in the same order. For example:
hc = 1 2 3 4 5 6 # smallest to greatest
family = 1 1 2 2 2 3 # smallest to greatest
Update:
This gets me the result I'm looking for, I want to see if there's a simpler way to do this
data.frame(state = hc$labels[hc$order],
row = 1:length(hc$labels),
family = cutree(hc, k = 5)[hc$order]
)