I have data of the form:
df <- data.frame(group = c(rep(1,5),rep(2,5),rep(3,5),rep(4,5),rep(5,5)),
thing = c(rep(c('a','b','c','d','e'),5)),
score = c(1,1,0,0,1,1,1,0,1,0,1,1,1,0,0,0,1,1,0,1,0,1,0,1,0))
which reports the "score" for each "thing" for a bunch of "group"s.
I would like to create the correlation matrix that shows the pairwise score correlations for all "thing"s based on the correlation in their scores across groups:
thing_a thing_b thing_c thing_d thing_e
thing_a 1 . . . .
thing_b corr 1 . . .
thing_c corr corr 1 . .
thing_d corr corr corr 1 .
thing_e corr corr corr corr 1
For example, the data underlying the correlation between thing "a" and thing "b" would be:
group thing_a_score thing_b_score
1 1 1
2 1 1
3 1 1
4 0 1
5 0 1
In reality, the number of unique groups is ~1,000 and the number of things is ~10,000 so I need an approach that is more efficient than a brute force for-loop.
I don't need the resulting matrix of correlations to be in a single matrix, or even in a matrix per-se (i.e., it could be a bunch of data sets with three columns "thing_1 thing_2 corr
").