3

I have a relatively large dataset, and I want to print a table of means and standard deviations for combinations of factors. I would like to have them in a format like this:

         A            B
test1    2.0 (1.0)    5.0 (2.0)
test2    6.3 (3.1)    2.1 (0.7)

Is there an easy way to do this?

The closest I get is using the tables::tabular function (minimal example):

# Example data
df = data.frame(
   group=c('A', 'A',  'A', 'B', 'B', 'B'),
   value=c(1,2,3,6,8,9))

# Print table     
library(tables)
tabular(value ~ group * (mean + sd), df)

... which outputs this:

       group               
       A        B          
       mean  sd mean  sd   
 value 2     1  7.667 1.52

But I haven't figured out a neat way to transform this format to the mean (SD) format above. Note: These examples are very minimal. I will have a larger hierarchy (currently 4 x (mean+sd) columns and 2 x 3 rows) but the fundamental problem is the same.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Jonas Lindeløv
  • 5,442
  • 6
  • 31
  • 54

2 Answers2

2

From data.table, we can use dcast (including your test var):

library(data.table)

df = data.frame(
  group=c('A', 'A',  'A', 'B', 'B', 'B','A', 'A',  'A', 'B', 'B', 'B'),
  value=c(1,2,3,6,8,9,1,2,3,6,8,9),
  test=c(1,1,1,1,1,1,2,2,2,2,2,2))

dcast(df, test ~ group, fun.aggregate = function(x){
  paste(round(mean(x),1)," (", round(sd(x),1),")", sep = "")
})
  test     A         B
1    1 2 (1) 7.7 (1.5)
2    2 2 (1) 7.7 (1.5)
Chris
  • 6,302
  • 1
  • 27
  • 54
  • Fyi, you're using `dcast` from the reshape2 package there; can load that package instead. – Frank Aug 08 '16 at 19:45
2
library(reshape2)

formatted.table <- dcast(df, 'value' ~ group, fun.aggregate = function(x) {
    return(sprintf('%0.1f (%0.1f)', mean(x), sd(x)))
})

# "value"         A         B
#   value 2.0 (1.0) 7.7 (1.5)

Similar to Chris's answer, but a little bit cleaner (and no "test" variable needed).

You can also do this type of aggregation with the dplyr package.

jdobres
  • 11,339
  • 1
  • 17
  • 37
  • you dont need the test variable, but it was in his sample frame at the beginning... Like the `sprintf` though! – Chris Aug 08 '16 at 20:10
  • Thanks, this does the tick! I can see now that I was a bit too minimal in my example. Actually, I have the test1 and test2 as separate *columns* in my data.frame, not as levels in a factor. `dcast` only takes one `value.var` - or is there a way? I could always do a `melt` before calling `dcast`. – Jonas Lindeløv Aug 08 '16 at 20:21
  • Melting the test1 and test2 columns and including the new column in your `dcast` call would be the way to do it. Hadley Wickham's `dplyr` package also now has a `summarize_all` function which can accomplish this. – jdobres Aug 08 '16 at 20:25