2

Here is some sample data on my problem:

mydf <- data.frame(A = rnorm(20, 1, 5),
                   B = rnorm(20, 2, 5),
                   C = rnorm(20, 3, 5),
                   D = rnorm(20, 4, 5),
                   E = rnorm(20, 5, 5))

Now I'd like to run a one-sample t-test on each column of the data.frame, to prove if it differs significantly from zero, like t.test(mydf$A), and then store the mean of each column, the t-value and the p-value in a new data.frame. So the result should look something like this:

      A    B    C    D    E
mean  x    x    x    x    x
t     x    x    x    x    x
p     x    x    x    x    x

I could definitely think of some tedious ways to do this, like looping through mydf, calculating the parameters, and then looping through the new data.frame and insert the values.
But with packages like plyr at hand, shouldn't there be a more concise and elegant way to do this?

Any ideas are highly appreciated.

vincentqu
  • 357
  • 1
  • 2
  • 6
  • [This](http://stackoverflow.com/questions/13109652/r-output-without-1-how-to-nicely-format) also might help you if you are using `regress`. – Metrics Jun 29 '13 at 20:56

2 Answers2

3

Try something like this and then extract the results you want from the resulting table:

results <- lapply(mydf, t.test)
resultsmatrix <- do.call(cbind, results)
resultsmatrix[c("statistic","estimate","p.value"),]

Gives you:

          A         B          C            D           E           
statistic 1.401338  2.762266   5.406704     3.409422    5.024222    
estimate  1.677863  2.936304   5.418812     4.231458    5.577681    
p.value   0.1772363 0.01240057 3.231568e-05 0.002941106 7.531614e-05
Thomas
  • 43,637
  • 12
  • 109
  • 140
1

a data.table solution :

library(data.table)
DT <- as.data.table(mydf)
DT[,lapply(.SD,function(x){
         y <- t.test(x)
         list(p = round(y$p.value,2),
              h = round(y$conf.int,2),
              mm = round(y$estimate,2))})]

           A          B         C         D         E
1:        0.2       0.42      0.01         0         0
2: -0.91,3.98 -1.15,2.62 1.19,6.15 2.82,6.33 2.68,6.46
3:       1.54       0.74      3.67      4.57      4.57
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • 1
    Might be nice to have row names. Also, I tried to format your code, but it just requires a carriage return to format correctly, so I didn't hit the 6 character minimum edit. – Thomas Jun 29 '13 at 20:56
  • @Thomas thanks. I was away. But there isn't rownames with data.table. – agstudy Jun 29 '13 at 21:18
  • Is there a conceptual advantage of `data.table` that justifies the additional code, compared to the solution from @Thomas ? – vincentqu Jun 30 '13 at 17:23