1

I want to summarise several columns from a data.frame. The grouping and summary was achieved with dplyr, as in the example below.

df = data.frame (time = rep(c("day", "night"), 10) , 
    who =rep(c("Paul", "Simon"), each=10) , 
    var1 = runif(20, 5, 15), var2 = runif(20, 10, 12), var3 = runif(20, 2, 7), var4 = runif(20, 1, 3)) 

Writting the function I need

quantil_x = function (var, num) { quantile(var, num, na.rm=T) }

Using it at var1 and exporting

percentiles = df %>% group_by(time, who) %>% summarise(
    P0 = quantil_x (var1, 0),
    P25 = quantil_x (var1, .25),
    P75 = quantil_x (var1, .75)
    )
write.table(percentiles, file = "summary_var1.csv",row.names=FALSE, dec=",",sep=";")

What I want is to repeat this same task for 'var2', 'var3' and 'var4'. I have tried to run a loop with no success to perform this task multiple times. Unfortunately I couldn't find a way to handle distinct calls of variables within the code. That is, within the loop I have tried to use summarise_(), tried to use get() inside the fuction quantil_x() or within summarise, also as.name but none of this worked.

I'm pretty sure this is a bad coding skill issue, but that's all I came up with so far. Here is an example of what I tried to do:

list = c("var1", "var2", "var3", "var4")
for (i in list){
percentiles = df %>% group_by(time, who) %>% summarise(
    P0 = quantil_x (get(i), 0),
    P25 = quantil_x (get(i), .25),
    P75 = quantil_x (get(i), .75)
    )
write.table(percentiles, file = paste0("summary_",i,".csv",row.names=FALSE, dec=",",sep=";")
}

I read this post, but didn't help much on my case.

Thanks in advance.

dudu
  • 528
  • 5
  • 13

2 Answers2

4

You can do this with summarise_each()

df %>% 
 group_by(time, who) %>% 
 summarise_each(funs (`0` = quantile(., 0, na.rm=T),
                      `25`= quantile(., .25, na.rm = T),
                      `75`= quantile(., .75, na.rm = T)))
Lucy
  • 981
  • 7
  • 15
1

You can do this with gather()

percentiles = df %>%
gather(Var,Value,var1,var2,var3) %>%
 group_by(Var,time, who) %>%
 summarise(
    P0 = quantil_x (Value, 0),
    P25 = quantil_x (Value, .25),
    P75 = quantil_x (Value, .75)
    )
Bishops_Guest
  • 1,422
  • 13
  • 13
  • This approach has the advantage of creating more lines than columns. Additionally it sorts lines in a more logical way than the other one sorts columns. – dudu Mar 07 '17 at 12:54