1

I have a really large csv file (dat2) and I'm trying to use cor.test to get a p value comparing one column (Age) to multiple other columns (every column after Age). Then I need to print the p value. I got it to work using a for loop, but it takes a really long time. I want to use an apply function because I think it will shorten the compile time?

The ## portion is the for loop that works.

b <- apply(dat2[,-1], 1(,4:ncol(dat2)), cor.test(dat2(Age), method="pearson", use="pairwise"))
sapply(b, "[[", "p.value")

## for (i in 4:ncol(dat2)) {
## a <- cor.test(dat2[,3], dat2[,i], method="pearson", use="pairwise")
## print(paste(colnames(dat2)[i], " p=value:", a$p.value))
## }
hawkeye03
  • 27
  • 5
  • Just for your understanding run `apply(dat2[, -1], 2, print)` and check the output. Depending on the margin (1 or 2) `apply` iterates over rows or columns one at a time and perform the function. – cropgen Apr 05 '19 at 18:55
  • This post provides you with ways that should be faster to compute then `apply(..., 2, cor.test)` : [A matrix version of cor.test()](https://stackoverflow.com/questions/13112238/a-matrix-version-of-cor-test) Especially because you only care about the p-values. – markus Apr 05 '19 at 19:09

1 Answers1

1

You were on the right track but with few mistakes. Check the following code, I believe it produces your desired output

b = apply(df2[, -1], 2, function(x) {
    cor.test(df2[, 1], x, method = "pearson", use = "pairwise")
})

p.vals <- sapply(b, "[[", "p.value")
p.vals

country   value 
      0       0 
cropgen
  • 1,920
  • 15
  • 24
  • Thank you, I am really new at r! The results for the p-value are all NA, it is also printing a result for Age. I only want to compare the columns after age, columns 3 and on, to Age, column 2. And then print the p-values for columns 3 and on. The warning message is: 1: In cor(x, y) : the standard deviation is zero. – hawkeye03 Apr 05 '19 at 19:42
  • in that case update the `cor.test` to reflect appropriate columns. If Age is column 2 then update as `apply(df2[, -c(1:2)]` and `cor.test(df2[, 2], x, method = "pearson", use = "pairwise")`. If there is no standard deviation among a variable, the correlation can't be computed, hence NA p-value – cropgen Apr 05 '19 at 19:48
  • Perfect, that gives me the correct result. Feel free to ignore these questions, but I don't understand why you use -c(1:2) though? Shouldn't that argument be the matrix that you want to apply the function to? – hawkeye03 Apr 07 '19 at 20:26
  • That is just to avoid the first two columns of the matrix. – cropgen Apr 07 '19 at 21:52