4

I have a dendrogram and I would like to extract all the lables under a node that I already know its height. For example:

data = data.frame(point = c('A','B','C','D','E'), 
                  x = c(2,2.5,2.1,3,5), 
                  y = c(3.1,4,5,6,2))
d = dist(as.matrix(data[, 2:3])) 
hc = hclust(d,method = "ward.D2")
plot(hc, labels = data$point)

Dendrogram

And we know the height of the all the nodes:

hc$height
# [1] 1.029563 1.345362 2.790161 4.584430

Now I would like to know all labels under a certain height, for example, with the height equals to 1.029563, I expect the results c("A", "B") and with the height equals to 1.345362, I expect the results c("C", "D").

Can someone help, please?

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214

2 Answers2

1

If you refer to ?hclust you'd see the somewhat confusing explanation of what the merge component is. In the example you gave:

hc$merge
#      [,1] [,2]
# [1,]   -1   -2
# [2,]   -3   -4
# [3,]    1    2
# [4,]   -5    3

And also:

hc$height
# [1] 1.029563 1.345362 2.790161 4.584430

You can see the node heights are in order from lowest to highest. Therefore, the points combined under that numbered node are the ones lower than it:


for (node in 1:length(hc$height)) {          # roll over the nodes
  points <- hc$merge[1:node, ]               # look at the relevant rows
  points_under_node <- -points[points < 0]   # negative values are points
  print(points_under_node)                   # points numbers
  print(c("node = ", node))                  # node number
  print(data$point[points_under_node])       # point names under node
}

A bit iffy but I hope this gets the point across.

Ronny Efronny
  • 1,148
  • 9
  • 28
  • Thank you for your answer, but with your suggestion, the result I get seems different to what I expected. But I have made some others "iffy" codes to get what I want. I'll post it on later :)) Thanks again – liverpool29 Sep 10 '19 at 16:26
  • 1
    @liverpool29 I edited the final bit, it should now print out the actual points `A` through `E`. This loop can be altered to perhaps save the points somewhere else, the print is just for show. Hope this helps. – Ronny Efronny Sep 11 '19 at 07:33
0

With the hint from @nicola and the answer from @Ronny Efronny, I tried to made codes that produce the results that I want (though they're not good looking):

A = hc$merge
labelList = c()
for (i in seq(1, max(clusters)-1)){
  if((A[i,1]<0) & (A[i,2]<0)){
    labelList[[i]] = c(-A[i,])
  }
  else if((A[i,1]<0) & (A[i,2]>0)){
    labelList[[i]] = c(-A[i,1], labelList[[A[i,2]]])
  }
  else if((A[i,1]>0) & (A[i,2]<0)){
    labelList[[i]] = c(-A[i,2], labelList[[A[i,1]]])
  }
  else if((A[i,1]>0) & (A[i,2]>0)){
    labelList[[i]] = c(labelList[[A[i,1]]], labelList[[A[i,2]]])
  }
}

Then labelList[[i]] gives the results correspond to the node with hc$height[i].

It would be nice if someone can help me edit the codes so that they look more beautiful :))