How to create a symmetric matrix in R counting how often two columns have the same values?

Question

Suppose I have a dataframe like this:

ID sp1 sp2 sp3
1  NA   1   1
2  0    0   1
3  1    NA  0
4  1    1   1

Here is what I wanted to get:

which shows the number of times two columns have the same value 1 here.

As the original dataframe is quite large, I hope to find a efficient way to address this.

Thank you very much for any efforts.

Shouldn't the cell in 4:4 have a value of 3? – Lamia Apr 17 '20 at 16:33 — Lamia, Apr 17 '20 at 16:33
@Lamia Yes, thank you very much. I've modified it. – YannZ Apr 17 '20 at 17:11 — YannZ, Apr 17 '20 at 17:11

score 2 · Accepted Answer · answered Apr 17 '20 at 17:19

2

In order to create a co-occurrence matrix from your data, you first need to convert your NAs into 0s, then do a cross-product of your data without the first ID column:

x = data.frame(ID = c(1:4), sp1 = c(NA,0,1,1), sp2 = c(1,0,NA,1), sp3 = c(1,1,0,1))
x[is.na(x)] = 0
crossprod(t(x[-1]))

     [,1] [,2] [,3] [,4]
[1,]    2    1    0    2
[2,]    1    1    0    1
[3,]    0    0    1    1
[4,]    2    1    1    3

answered Apr 17 '20 at 17:19

Lamia

3,845
1
12
19

Thanks @Lamia I think this what I need. But maybe my data were too large, it turned out with the error 'cannot allocate vector of size 500Gb'. Thank you all the same and I'll try to subset the datasets. – YannZ Apr 17 '20 at 18:21
Have a look at crossprod of sparse matrices in the `Matrix` package. It should be more memory efficient. – Lamia Apr 17 '20 at 18:37

How to create a symmetric matrix in R counting how often two columns have the same values?

1 Answers1