0

I have a huge matrix of this form, with 1000000 rows and 10000 columns. This is a toy example:

A B C Mean
1 3 4 2.66
2 4 3 3
1 3 4 2.66
9 9 9 9
1 3 2 2
2 4 5 3
1 2 6 3
2 3 5 3.33

The rows in column "Mean" represent the mean of A, B and C for each row. On the other hand, the global mean of column "Mean" is 3.58. I would like to know, using a t-test and R, whether the mean in each row is significantly higher from the global mean. How can I get the p-values for comparison?. Comparing means between 2 groups is very simple using t.test(), but I am not able to find how to compare a single value with the mean of a group that includes that value.

Lucas
  • 1,139
  • 3
  • 11
  • 23
  • 1
    This is a statistics question, better suited for CV. – Roman Luštrik Mar 08 '18 at 20:55
  • Hi @RomanLuštrik, I asked a similar question long ago in CV, but nobody answered. I am sure that in SO there are many people working in statistics/R who will read this post, and that I have a better chance to get a response from SO users. – Lucas Mar 08 '18 at 21:08

1 Answers1

3

I strongly agree with Roman that you should go back to CV, since this seems liable to giving you a number of false positives.

But in terms of your R question, you could try a one-sample t-test here:

global.mean <- 3.58
val.matrix <- matrix(c(...),...)

pvals <- apply(val.matrix,1,function(r) t.test(r,mu=global.mean)$p.value)
### should do a multiple comparison correction here, e.g., pvals*nrow(val.matrix)

This will give you a vector of size nrow(val.matrix) with each element being the p-value from the two-sided t-test testing whether the values of a row are significantly different from 3.58. I'm not advocating for this statistical approach, but this is how you could implement it.

Daniel
  • 1,291
  • 6
  • 15