0

I would like to improve my dendrogram that I made using the pvclust package. I am not able to see most AU / BP labels, as you can see in the image.

Could you help me solve this ?. I would like to see all AU / BP labels for the dendrogram.

Below is an executable code.

Thank you!

library(rdist)
library(pvclust)
library(geosphere)

df<-structure(list(Latitude = c(-23.8, -23.8, -23.9, -23.9, -23.9,  -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, 
+ -23.9, -23.9, -23.9, -23.9, -23.9), Longitude = c(-49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.7, 
+ -49.7, -49.7, -49.7, -49.7, -49.6, -49.6, -49.6, -49.6), Waste = c(526, 350, 526, 469, 285, 175, 175, 350, 350, 175, 350, 175, 175, 364, 
+ 175, 175, 350, 45.5, 54.6)), class = "data.frame", row.names = c(NA, -19L))

coordinates<-subset(df,select=c("Latitude","Longitude")) 
d<-as.dist(distm(coordinates[,2:1]))
mat <- as.matrix(d)
mat <- t(mat)
fit <- pvclust(mat, method.hclust="average", method.dist="euclidean", 
               nboot=1000, r=seq(0.9,1.4,by=.1))
fit
plot(fit,hang=-1,cex=.8,main="Average Linkage Clustering")
pvrect(fit, alpha=.80, pv="au", type="geq")

enter image description here

Considering 325 locations

enter image description here

1 Answers1

2

The simplest way is to change the size of the plot window and increase the hang= argument:

x11(width=12, height=8) # quartz(width=12, height=8) for mac or windows(width=12, height=8) for Windows
plot(fit,hang=.05,cex=.8,main="Average Linkage Clustering")
pvrect(fit, alpha=.80, pv="au", type="geq")

Dendrogram

Here is an example with 150 cases (about half the 325 you have, but from a data set that is included with R:

data(iris)
mat <- t(as.matrix(iris[, 1:4]))
fit <- pvclust(mat, method.hclust="average", method.dist="euclidean",
               nboot=1000, r=seq(0.9,1.4,by=.1))

Now print the results to pdf:

pdf(file="Dendrogram.pdf", width=13, height=7.5)
compression="lzw")
plot(fit,hang=.05, cex=.5, cex.pv=.5, main="Average Linkage Clustering")
pvrect(fit, alpha=.80, pv="au", type="geq")
dev.off()

Dendrogram

The pdf has better resolution, but the overlap in the text is less. The other option is to reduce the labelling:

plot(fit,hang=.05, cex=.5, cex.pv=.5, print.num=FALSE, print.pv=FALSE, 
     labels=FALSE, main="Average Linkage Clustering")
pvrect(fit, alpha=.80, pv="au", type="geq")

This prints just the dendrogram without any labeling so you can see the structure but not the details. In some cases the data represent several groups. The iris data include three species. You can label just species by changing to labels=rep(1:3, each=50) so that the numbers 1, 2, 3 identify the three species.

dcarlson
  • 10,936
  • 2
  • 15
  • 18
  • Thank you drcarlson you answered my question. I will accept. However, I did it the way you spoke to my data, which is 325 instead of 19 as in the example you did (I only did 19 just to make it easier to execute). It improved the graph, but is there a way to improve it even more ?? I inserted the image above for you to see how it turned out. If you wish I can pass you the database with 325 locations. –  Dec 15 '20 at 20:20
  • With 325 locations you must decide what is important to show. Your computer monitor is only so large and its resolution is lower than paper. For higher resolution you need to print to a file such as `.pdf` or `svg` or `.tiff`. Then you can make the font size smaller. I'll add an example above to print a pdf to fit legal size paper. The other approach is to plot less information. The options that remove text from the plot are `print.num=FALSE`, `print.pv=FALSE`, and `labels=FALSE`. – dcarlson Dec 15 '20 at 22:02
  • Thank you so much for your answer. I really liked the graph you made in the PDF, I could see the numbers well. I wanted to do something similar to what you did. However, I did the test in my case and it still didn't look so good. I may be missing something. If it is not uncomfortable, you could test with the base that is in this link: https://github.com/JovaniSouza/JovaniSouza5/blob/master/database.xlsx It is the database I am using. If you accept, when using the pvclust function, use nboot = 5, because using nboot = 1000 is taking too long. Thank you again. –  Dec 15 '20 at 23:35
  • Try `pdf(file="DendrogramANSI-D.pdf", width=32, height=20) # ANSI D 22 in. x 34 in.` This will be too big to print on most printers, but you can zoom in on your computer screen and read the numbers. – dcarlson Dec 16 '20 at 01:24
  • Thanks again dcarlson! You helped me a lot. –  Dec 16 '20 at 01:38
  • dcarlson, could you please take a look at the following question: https://stats.stackexchange.com/questions/507685/interpret-results-obtained-by-the-pvclust-package-of-software-r –  Feb 02 '21 at 15:24