How can I compare two lists of means in a dataframe, by row

Question

I am trying to compare with t-test, two list of gene expressions mean values.

My matrix is built like this

col1 <- c(6.7 , 8.4, 3.1)
col2 <- c(7.7, 8.8, 3.6)
matrix <- cbind(col1, col2)
rownames(matrix) <- c("gene1", "gene2", "gene3")

I want to get the p value for each genes. all that I know is that col1 correspond to means calculated on 22 sample and col2 30 samples.

I tried to apply a t-test per row, but it is not working.

apply(t.test, matrix$col1, matrix$col2, 1)

Sorry Alain. You cannot calculate t-test for each gene with single gene means. A t-test needs replicates measures to estimate variance. — Stephen Henderson, Jun 21 '19 at 10:18
**1.** don't overwrite function names, check e.g. with `?matrix` before defining a new name (`matrix` is actually quite important), **2.** applying `$` on a matrix won't work, use `[, "col*"]` instead, **3.** try `mat <- cbind(col1, col2); t.test(mat[, "col1"], mat[, "col2"])` — jay.sf, Jun 21 '19 at 11:23

score 1 · Answer 1 · answered Jun 21 '19 at 11:58

I think you need to do a better job of defining what, exactly it is that you want to compare. There's no such thing as a p value of a mean. What are you comparing, base pair variance between a gene in column 1 and one in column 2? Or is col. 1 the full sequence of one gene and col2 the full sequence of a second gene? Your question doesn't make it clear what you're analyzing, and without that you may have good math that means nothing.

Here's a good definition of t test, assuming that that test is, in fact, what you ought to be using. Note that this test requires not only the difference between the means (which you could calculate from what you showed us), the standard deviation of each mean (which you didn't), and the number of items (which you did). This means we only have 2 out of 3 of the necessary inputs. To get the 3rd, either you need to supply it, or you need to supply the raw data which produced it.

Thanks a lot for your clear answer DanM, you have got the point: what I am missing is the data variance. As I don't have acces to the raw data, how can I supply it or perform the appropriate test to compare the two means for a same gene, taking into account the number of items? — Alain_LU, Jun 24 '19 at 14:43
I'm afraid you can't @Alain_LU. It's kind of like those TV shows where they refine a pixellated image to read a license plate ... makes for good TV but you can't do it in the real world. I'll give you an example: The number sets (1, 10, 19) and (8, 10, 12) both average to 10, and both have an n of 3. But if all you have is a mean of 10 and an n of 3, there's no way you can extrapolate the variance. From what you've described of your data, you're in the same position. You can't get there from here ... sorry! — DanM, Jun 24 '19 at 18:13

How can I compare two lists of means in a dataframe, by row

1 Answers1