0

I am trying to summarize data across two variables, and the output with summarize is very chunky (at least in the r notebook output where the table breaks over multiple pages). I'd like to have one variable as the rows of the summary output, and the other as the columns, and then in the actual table the means for each combination of row & column data Some example data:

 dat1 <- data.frame(
    category = rep(c("catA", "catB", "catC"), each=4),
    age = sample(1:2,size=4,replace=T),
    value = rnorm(12)
 )

and then I would usually get my summary dataframe like this:

dat1 %>% group_by(category,age)%>% summarize(mean(value))

which looks like this: enter image description here

but my actual data each of the variables have 10+ levels, so the table is very long and hard to read. I would prefer something like this, which I created using:

dat1 %>% group_by(category)
%>% summarize(mean.age1 =mean(value[age==1]),
mean.age2 =mean(value[age==2]))

enter image description here

There must be a better way than hand-coding means column?

Esther
  • 441
  • 2
  • 15

1 Answers1

2

You just need to use tidyr in addition to do something like this:

library(dplyr)
library(tidyr)
dat1 %>%
  group_by(category, age) %>%
  summarise(mean = mean(value)) %>%
  spread(age, mean, sep = '')

Output is as follows:

Source: local data frame [3 x 3]
Groups: category [3]

  category      age1      age2
*   <fctr>     <dbl>     <dbl>
1     catA 0.2930104 0.3861381
2     catB 0.5752186 0.1454201
3     catC 1.0845645 0.3117227
Gopala
  • 10,363
  • 7
  • 45
  • 77