0

Short version of question: How can I use ddply to summarize my dataframe grouped by several variables?

I currently use this code to summarize by Condition:

ddply(ExampleData, .(Condition), summarize,  Average=mean(Var1, na.rm=TRUE), SD=sd(Var1),N=length(Var1), Med =median(Var1))

How can I adjust the code to summarize by two variables (Condition and Block)?

Desired output format something like:

  Condition Block Average SD  N Med
1         A     1    0.50 .. ..  ..
2         A     2    0.80 .. ..  ..
3         B     1    0.90 .. ..  ..
4         B     2    0.75 .. ..  ..

====

Longer version of question with example data.

Dataframe:

ExampleData <- structure(list(Condition = c("A", "A", "A", "B", "B", "B"), Block = c(1, 
2, 1, 2, 1, 2), Var1= c(0.6, 0.8, 0.4, 1, 0.9, 0.5)), row.names = c(NA, 
6L), class = "data.frame")

which is:

  Condition Block Average SD N Med
1         A     1  0.6 
2         A     2  0.8
3         A     1  0.4
4         B     2  1.0
5         B     1  0.9
6         B     2  0.5

I realize there are alternative ways to get the summary but it would be good for my learning if I understood how to adjust the function that I have. I just didnt succeed in making it work and I couldnt find an example to help me here on stackoverflow. I am looking for something like:

ddply(ExampleData, .c(Condition,Block), summarize,  Average=mean(Var1, na.rm=TRUE), SD=sd(Var1),N=length(Var1), Med =median(Var1))

(or .(Condition*Block) or list(Condition,Block) or ... ??)

Kastany
  • 427
  • 1
  • 5
  • 16

1 Answers1

1

Just remove the c in the .variables argument, so your code is:

library(plyr)
ddply(ExampleData, .(Condition, Block), summarize,  Average=mean(Var1, na.rm=TRUE), SD=sd(Var1),N=length(Var1), Med =median(Var1))

By the way, you might want to switch to using dplyr instead of plyr. https://blog.rstudio.com/2014/01/17/introducing-dplyr/

If you were to do this in dplyr:

summarize(group_by(ExampleData, Condition, Block), Average=mean(Var1, na.rm=TRUE), SD=sd(Var1),N=length(Var1), Med =median(Var1))

You could also use the piping so this could be:

ExampleData %>% 
  group_by(Condition, Block) %>% 
  summarise(Average=mean(Var1, na.rm=TRUE), 
            SD=sd(Var1),
            N=length(Var1), 
            Med =median(Var1))
Kerry Jackson
  • 1,821
  • 12
  • 20
  • thank you for taking the time to help! I'll check out dplyr. The "%>%" syntax scared me off a bit so far, but will do.. – Kastany Sep 05 '18 at 13:13