7

Constructing a ggpairs figure in R using the following code.

df is a dataframe containing 6 continuous variables and one Group variable

ggpairs(df[,-1],columns = 1:ncol(df[,-1]),
mapping=ggplot2::aes(colour = df$Group),legends = T,axisLabels = "show", 
upper = list(continuous = wrap("cor", method = "spearman", size = 2.5, hjust=0.7)))+ 
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black"))

I am trying to add the p-value of spearman correlation to the upper panel of the figure generated (i.e) appended to the Spearman correlation coefficient.

Generally, the p-value is computed using cor.test with method passed as "Spearman"

Also aware of the StackOverFlow post discussion a query similar to this, but I need for ggpairs, for which the solution is not working. Also, the previous query is not solved yet.

How to add p values for Spearman correlation coefficients plotted using pairs in R

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Praveen Kumar-M
  • 223
  • 2
  • 10

3 Answers3

4

Not sure if it's because you have groups or using a different version of the package (I'm using GGally_2.1.1), but the following code works perfectly for me.

df %>% ggpairs(upper = list(continuous = wrap("cor", method = "spearman")))
Yixin Zhao
  • 41
  • 4
2

I have a feeling this is more than what you expected.. so you need to define a custom function like ggally_cor, so first we have a function that prints the correlation between 2 variables:

printVar = function(x,y){
      vals = cor.test(x,y,
      method="spearman")[c("estimate","p.value")]
      names(vals) = c("rho","p")
      paste(names(vals),signif(unlist(vals),2),collapse="\n")
}

Then we define a function that takes in the data for each pair, and calculates 1. overall correlation, 2. correlation by group, and pass it into a ggplot and basically only print this text:

my_fn <- function(data, mapping, ...){
  # takes in x and y for each panel
  xData <- eval_data_col(data, mapping$x)
  yData <- eval_data_col(data, mapping$y)
  colorData <- eval_data_col(data, mapping$colour)

# if you have colors, split according to color group and calculate cor

  byGroup =by(data.frame(xData,yData),colorData,function(i)printVar(i[,1],i[,2]))
  byGroup = data.frame(col=names(byGroup),label=as.character(byGroup))
  byGroup$x = 0.5
  byGroup$y = seq(0.8-0.3,0.2,length.out=nrow(byGroup))

#main correlation
mainCor = printVar(xData,yData)

p <- ggplot(data = data, mapping = mapping) +
annotate(x=0.5,y=0.8,label=mainCor,geom="text",size=3) +
geom_text(data=byGroup,inherit.aes=FALSE,
aes(x=x,y=y,col=col,label=label),size=3)+ 
theme_void() + ylim(c(0,1))
  p
}

Now I use mtcars, first column is a random Group:

df  =data.frame(
Group=sample(LETTERS[1:2],nrow(mtcars),replace=TRUE),
mtcars[,1:6]
)

And plot:

ggpairs(df[,-1],columns = 1:ncol(df[,-1]),
mapping=ggplot2::aes(colour = df$Group),
axisLabels = "show", 
upper = list(continuous = my_fn))+
theme(panel.grid.major = element_blank(), 
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black")) 

enter image description here

I think for your own plot, the spacing of the text might not be optimal, but it's just a matter of tweaking my_fn .

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • you should have blog page for how to create function step by step ..i would be spending most of my times there – PesKchan Sep 05 '22 at 18:03
0

Works well. But the signif rounding off probably is not good and is not working for p-value. Let me explain why? Signif will not round off the p-value less than 0.01 and will print the value as such (with 10th power represented as e). Suppose we use the round function, then also it is not good. Because, if the p-value is less than 0.001 it will come as 0 (with 2 places rounding off). Similarly, if the p-value is less than 0.01 it will come as 0 again (with 2 places rounding off).

So a mild modification of the code will take care of it.

printVar = function(x,y){
      vals = cor.test(x,y,
      method="spearman")[c("estimate","p.value")]

      vals[[1]]<-round(vals[[1]],2)   
      vals[[2]]<-ifelse(test = vals[[2]]<0.001,"<0.001",ifelse(test=vals[[2]]<0.01,"<0.01",round(vals[[2]],2)))

          names(vals) = c("rho","p")
      paste(names(vals),unlist(vals),collapse="\n")
}

And secondly, if we run the code as such it is giving an error that LAB is not found.

LAB is a character string required for the label.

You can either give character string. or just pass

LAB=c()

Praveen Kumar-M
  • 223
  • 2
  • 10
  • 1
    thanks for pointing out the error. I have corrected the LAB. you don't need to manual input. it should be mainCor. And btw that piece of code was not easy to hack.. minor issues such as rounding etc.. it's specific to your own use. – StupidWolf May 09 '20 at 09:50
  • Ok. I think you are wanting to say "And btw that piece of code was easy to hack" – Praveen Kumar-M May 09 '20 at 14:54