Calculating correlation p value between all columns of dataframe A and all columns of dataframe B in R

Question

I have two dataframes A and B. I would like to calculate the correlation coefficient and the accompanying p-value between all columns of A and all columns of B.

For correlation coefficients it works out using the cor() function. However, something similar is not possible with cor.test(). How do I do this?

Does this answer your question? [A matrix version of cor.test()](https://stackoverflow.com/questions/13112238/a-matrix-version-of-cor-test) — GordonShumway, Jan 31 '20 at 16:33
The link by @GordonShumway provides an answer. [This answer](https://stackoverflow.com/a/13112337/8245406) can be adapted to the two df's case. If you cannot do it, say something I will post code. — Rui Barradas, Jan 31 '20 at 17:07
@GordonShumway and Rui Barradas, thanks for the help, I was able to get what I wanted! — Domu, Feb 03 '20 at 17:01

score 0 · Answer 1 · edited Jun 20 '20 at 09:12

0

Showing answer for finding p-values for 4 variables

Create a square matrix for p-values

library(gdata)
mat.pvals <- matrix(NA, 4, 4)
prs <- combn(1:4, 2)
lowerTriangle(mat.pvals) <- unlist(lapply(1:ncol(prs),function(x) 
{cor.test(df[,prs[1,x]], df[, prs[2,x]], alt="two.sided")$p.val}))
mat.pvals <- t(mat.pvals)

Round upto 5 decimals

round(mat.pvals, 5)

edited Jun 20 '20 at 09:12

Community

1
1

answered Jan 31 '20 at 17:03

Aashay Mehta

118
8

score 0 · Answer 2 · answered Jan 31 '20 at 17:33

Here is a simple way to handle two data frames. First make some data:

set.seed(42)
A <- data.frame(matrix(rnorm(100), 25))
B <- data.frame(A + matrix(rnorm(100, 0, .1), 25))

Now create vectors of the row and column indices and use mapply to make a list:

a <- rep(seq_len(ncol(A)), ncol(B))
b <- rep(seq_len(ncol(B)), each=ncol(B))
all <- mapply(function(x, y) cor.test(A[, x], B[, y]), a, b, SIMPLIFY=FALSE)

The list includes all of the information for each correlation test, e.g.

all[[1]]
#   Pearson's product-moment correlation
# 
# data:  A[, x] and B[, y]
# t = 61.1, df = 23, p-value <2e-16
# alternative hypothesis: true correlation is not equal to 0
# 95 percent confidence interval:
#  0.99295 0.99867
# sample estimates:
#     cor 
# 0.99694

Now extract just the correlations and p.values. As noted, the rows are A and the columns are B:

# Results with A=rows, and B=columns
corrs <- matrix(sapply(x, "[[", "estimate"), ncol(A))
pvals <- matrix(sapply(x, "[[", "p.value"), ncol(A))
round(corrs, 4)
#         [,1]    [,2]    [,3]    [,4]
# [1,]  0.9969 -0.0870 -0.3249 -0.0690
# [2,] -0.0582  0.9964 -0.1256 -0.1166
# [3,] -0.3216 -0.1456  0.9955  0.1324
# [4,] -0.0765 -0.1497  0.1411  0.9958
round(pvals, 4)
#        [,1]   [,2]   [,3]   [,4]
# [1,] 0.0000 0.6794 0.1130 0.7431
# [2,] 0.7822 0.0000 0.5498 0.5788
# [3,] 0.1169 0.4873 0.0000 0.5282
# [4,] 0.7161 0.4750 0.5010 0.0000

Calculating correlation p value between all columns of dataframe A and all columns of dataframe B in R

2 Answers2

Showing answer for finding p-values for 4 variables

Create a square matrix for p-values

Round upto 5 decimals