-2

When I use the command hclust in R, to plot a dendrogram, I get the y axis labeled as Height. However, I'd like to label this axis with the similarity levels between the groups in my dataset, something like the image below. How can I achieve this?

Here is a minimal example:

set.seed(1)
x = matrix(rnorm(1000), ncol=100)
d = dist(x, method="euclidean")
plot(hclust(d, method="complete"))
VisioN
  • 143,310
  • 32
  • 282
  • 281
Marcus Nunes
  • 851
  • 1
  • 18
  • 33
  • 2
    Do you just want to change the label, or do you want to change what is plotted? – David Robinson Jan 13 '13 at 21:35
  • 1
    Similarity and Distance are opposites. hclust() works with a distance measure (e.g. Euclidean distance) so that large distances mean very dissimilar and small distances mean very similar. How are you planning to construct a similarity measure based on Euclidean distance (e.g. 100 - d, or 1 - d/max(d)? – dcarlson Jan 13 '13 at 22:18
  • 1
    "Do you just want to change the label, or do you want to change what is plotted?" I wanna change what is plotted, not just the label. Like dcarlson said, I need to construct a similarity measure, not a distance. How can I do it? – Marcus Nunes Jan 14 '13 at 01:54

1 Answers1

3

Starting with your example, but saving the cluster results as hc:

set.seed(1)
x <- matrix(rnorm(1000), ncol=100)
d <- dist(x, method="euclidean")
hc <- hclust(d, method="complete")

hc$height
[1] 12.79157 13.05586 13.51490 13.54069 14.32658 14.45824 15.70899 16.44131
[9] 17.12514

Distance ranges from 12.8 to 17.1. For simplicity we use 18-d as the similarity measure.

plot(hc, hang=-1, ylab="Similarity", axes=FALSE)
axis(2, seq(0, 18, by=2), seq(18, 0, by=-2))

enter image description here

dcarlson
  • 10,936
  • 2
  • 15
  • 18