3

I am trying to generate a high-resolution dendrogram in R.

The difficulty is that there are more than 200 leaf nodes, and each node is identified by a string. I would like to ensure that each of these string labels is readable in the generated (printed) plot.

Another thing is that I would like to switch the original x-axis (corresponding to leaf nodes) to the y-axis, and switch the original y-axis to x-axis. For more clear demonstration purposes, I would like to add one more x-axis (which corresponds to the distance information in the switched plot) on the top of the plot. How can one do this in R?

joran
  • 169,992
  • 32
  • 429
  • 468
user785099
  • 5,323
  • 10
  • 44
  • 62
  • Could you provide more information with what functions you intend to make these plots. There are multiple systems in R (ggplot, lattice, base). Furthermore, if you could provide us with an example dataset, that would help very much. – Paul Hiemstra Dec 07 '11 at 15:16
  • For 200 strings, side by side, to be readable, you going to need to print it on some pretty big paper. – Richie Cotton Dec 07 '11 at 15:26
  • Hi, paul, the input will be a 210*210 matrix of dissimilarity matrix based on correlation coefficient.I am new to R, so I do not have much idea about which exact library should I use for generating this kind of plot. I notice that there have more than one libs that can do that. But I do not know which one is able to support high-resolution ploting and allow me to switch the placement of axis and add more axiss if I want. – user785099 Dec 07 '11 at 15:29
  • Just more clarifications, I will generate this plot for poster. – user785099 Dec 07 '11 at 15:30
  • 1
    The `ggdendro` package allows you to convert dendrogram plotting data to data.frames which you can then plot using `ggplot`. This might help you solve most of these problems (except the dual axis). http://cran.r-project.org/web/packages/ggdendro/index.html – Andrie Dec 07 '11 at 15:31
  • In addition to Andrie, using ggsave from the ggplot2 package, you can quite easily save a high resolution png or pdf file. – Paul Hiemstra Dec 07 '11 at 15:33

2 Answers2

5

You can achieve this with standard R functions.

Plot a dendrogram

To plot a dendrogram from a distance matrix you can use the hclust function. See its man page for further details on the algorithms available.

# To produce a dummy distance matrix
distMatrix <- dist(matrix(1:9, ncol=3))

# To convert it into a tree
tree <- hclust(distMatrix)

For the plot, the dendrogram class provides a useful plot method. Just convert the hclust output to dendrogram and plot it :

dendro <- as.dendrogram(tree)

This method provides a horiz argument that can switch X and Y axis, test the following :

plot(dendro, horiz=TRUE)
plot(dendro, horiz=FALSE)

Manage its size

For the readability, it is up to the device you use for exporting the image. R can produce huge images, it is up to the user to set the size and resolution. See the man page for png or pdf for further details (width, height and res are interesting arguments).

An other track to follow is the graphical parameters : playing with the various cex values, you will be able to resize the labels. See the man page of par for further details.

Readability is quite human oriented, so i don't think you will find an automated way to obtain a readable plot automaticaly, but with a few manual tunning you can achieve it with the tools i mentionned. If automation is mandatory, it can be obtained using some par elements generated by R like cin to predict the needed device width, but it is much simpler to tune it manually.

New axis

The axis function can help you.

maressyl
  • 983
  • 6
  • 11
1

Took me a while to get this:

# get font factor
pdf(); ff<-72/par()$ps; dev.off();
# if there are more than 20 entries 
if (dim(x)[2] > 20) {
    # scale output by font size
    pdf(fout, height=dim(x)[2]/ff)
} else {
    pdf(fout)
}
# increase right margin width
op <- par(mar = par("mar") + c(0,0,0,2*max(nchar(colnames(x)))/ff))
# plot horizontally
plot(as.dendrogram(hclust(distance), hang=-1), main="Dissimilarity = 1 - Correlation", xlab="", horiz=T)
# restore margin
par(op)
dev.off();
Erik Aronesty
  • 11,620
  • 5
  • 64
  • 44