Correlation/p values of all combinations of all rows of two matrices

Question

I would like to calculate the correlation and the p value of that correlatio of each species (bac) to each of the factors (fac) in a second data frame. Both were measured at the same number of stations, but the number of bac and fac don't match.

bac1 <- c(1,2,3,4,5)
bac2 <- c(2,3,4,5,1)
bac3 <- c(4,5,1,2,3)
bac4 <- c(5,1,2,3,4)
bac <- as.data.frame(cbind(bac1, bac2, bac3, bac4 ))
colnames(bac) <- c("station1", "station2", "station3", "station4")
rownames(bac) <- c("bac1", "bac2", "bac3", "bac4", "bac5")

fac1 <- c(1,2,3,4,5,6)
fac2 <- c(2,3,4,5,1,6)
fac3<- c(3,4,5,1,2,6)
fac4<- c(4,5,1,2,3, 6)
fac <- as.data.frame(cbind(fac1, fac2, fac3, fac4))
colnames(fac) <- c("station1", "station2", "station3", "station4")
rownames(fac) <- c("fac1", "fac2", "fac3", "fac4", "fac5", "fac6")

I imagine the result looking somewhat like this, somewhere keeping the names to know which combination is presented:

bac1-fac1 cor1 p1
bac1-fac2 cor2 p2
bac1-fac3 cor3 p3

bac2-fac1 corx px...

I have looked at function rcorr from Hmist and corr.test from psych, but can't find an example with the neccessary permutation of rows...Any ideas?

score 4 · Accepted Answer · edited May 23 '17 at 12:08

4

If you restructure your data, such that you compute correlation between paired columns, it would be super easy.

tbac <- data.frame(t(bac))
tfac <- data.frame(t(fac))

f <- function (x, y) cor(x, y)

tab <- outer(tfac, tbac, Vectorize(f))

as.data.frame.table(tab)

I had an answer using the same idea: Match data and count number of same value.

edited May 23 '17 at 12:08

Community

1
1

answered Jan 22 '17 at 16:29

Zheyuan Li

71,365
17
180
248

Very compact way of doing this. Great answer as always! – akrun Jan 22 '17 at 16:37
I didn't remember it, but thanks for sharing that one. – akrun Jan 22 '17 at 16:41
1

Thank you, this seems to work very well! I was wondering why any correlation with fac6 produced NA, but figured it out (all values are 6). – Helena Jan 22 '17 at 17:08

user20650 · Answer 2 · 2017-01-22T18:39:19.450

You can just pass the full matrices to the cor function (or psych::corr.test)and it takes care of finding the correlation of the relevant columns.

For example

cor(t(fac), t(bac))
#            bac1        bac2        bac3        bac4        bac5
# fac1  0.9899495 -0.07559289 -0.60000000 -0.60000000 -0.07559289
# fac2  0.9899495 -0.07559289 -0.60000000 -0.60000000 -0.07559289
# fac3 -0.3207135  0.94285714 -0.07559289 -0.07559289 -0.48571429
# fac4 -0.8000000 -0.32071349  0.98994949  0.98994949 -0.32071349
# fac5 -0.3207135 -0.48571429 -0.07559289 -0.07559289  0.94285714
# fac6         NA          NA          NA          NA          NA

You can then turn this in to long format using reshape2::melt

reshape2::melt(cor(t(fac), t(bac)))
#    Var1 Var2       value
# 1  fac1 bac1  0.98994949
# 2  fac2 bac1  0.98994949
# 3  fac3 bac1 -0.32071349
# 4  fac4 bac1 -0.80000000
# ---
# ---

To get the p-values use the same approach

test <- psych::corr.test(t(fac), t(bac), adjust="none")

And melt as before and join

merge(melt(test$r, value.name="cor"), melt(test$p, value.name="p-value"), by=c("Var1", "Var2"))
#   Var1 Var2         cor    p-value
# 1 fac1 bac1  0.98994949 0.01005051
# 2 fac1 bac2 -0.07559289 0.92440711
# 3 fac1 bac3 -0.60000000 0.40000000
# 4 fac1 bac4 -0.60000000 0.40000000
# 5 fac1 bac5 -0.07559289 0.92440711
# 6 fac2 bac1  0.98994949 0.01005051

That is a great option. I missed the transpose part. – akrun Jan 23 '17 at 03:15 — akrun, Jan 23 '17 at 03:15

score 1 · Answer 3 · answered Jan 22 '17 at 16:27

1

We can use expand.grid to get the combinations of rownames of 'bac' and 'fac', loop through the rows with apply specifying the MARGIN as 1, subset the rows of 'bac' and 'fac' based on the rownames, do the corr.test and extract the 'p' values as a list

library(psych)
do.call(c, apply(expand.grid(rownames(bac), rownames(fac)), 1, 
  function(x) list(corr.test(cbind(unlist(bac[1,]), unlist(fac[1,])))$p)))

answered Jan 22 '17 at 16:27

akrun

874,273
37
540
662

@李哲源ZheyuanLi It gets you some of the other parameters as welll in a `list`. I think `rcorr` also does similar thing from `Hmisc` – akrun Jan 22 '17 at 16:35
I recently discovered expand.grid and I really liked it. But I try your solution, the output does not seem correct...and I don't have any row/columnnames? – Helena Jan 22 '17 at 17:14

Ryan · Answer 4 · 2017-01-22T17:21:11.463

You can just loop over the rows of expand.grid

pairs <- as.matrix(expand.grid(1:nrow(bac),1:nrow(fac)))
pairs <- cbind(pairs,NA,NA)
b <- as.matrix(bac)
f <- as.matrix(fac)
for(i in 1:nrow(pairs)){
    pairs[i,3] <- cor(b[pairs[i,1],], f[pairs[i,2],])
    pairs[i,4] <- cor.test(b[pairs[i,1],], f[pairs[i,2],])$p.value
}
colnames(pairs) <- c('bac','fac','corr','p')
pairs
##      bac fac        corr          p
## [1,]   1   1  0.98994949 0.01005051
## [2,]   2   1 -0.07559289 0.92440711
## [3,]   3   1 -0.60000000 0.40000000
## [4,]   4   1 -0.60000000 0.40000000
## [5,]   5   1 -0.07559289 0.92440711
## [6,]   1   2  0.98994949 0.01005051

If you want the names you can then do

pairs <- as.data.frame(pairs)
pairs[,1] <- sapply(pairs[,1],function(x) rownames(bac)[x])
pairs[,2] <- sapply(pairs[,2],function(x) rownames(fac)[x])

although at that point it's probably easier to use 李哲源 Zheyuan Li 's solution.

Thanks, also very helpful, but does not keep the original names, which would be helpful in my "real" case! — Helena, Jan 22 '17 at 17:11

Correlation/p values of all combinations of all rows of two matrices

4 Answers4

Linked

Related