Part of the solution to my problem I found here: How to calculate correlation In R
set.seed(123)
X <- data.frame(ID = rep(1:2, each=5), a = sample(1:10), b = sample(1:10))
ddply(X, .(ID), summarize, cor_a_b = cor(a,b))
In addition to cor
(which calculates Pearsons r) I calculate cor.test
(for the p-value). But this fails in case of "not enough finite observations", so when some IDs are solo, which they are quite often in my case.
So I need to calculate r only if there are more than 30 or so pairs of data, if there are less I want NA.
Second problem is that the verbose output of cor.test
inflates the resulting data frame - even if the only thing I wanted is the p-value. That is, if p actually is, what I understand it to be. Is it the significance of r?
I only know the t-test, to calculate the significance of r.
{Formula of the t-test-value: t = (r·(n-2)^0.5)/(1-r^2)^0.5)
- but t is not the significance yet, otherwise I would try to implement the formula into the ddply
statement}