4

So here is the problem: I want to use a for loop in my R code to summarize different columns.

As an example, here what it could look like:

all.columns<-c("column4","column5","column6","column7")
for (i in 1:4) {  
df%>%
 group_by(column3)%>%
 summarise(Mean=mean(all.columns[i]),
           Max=max(all.columns[i]))
} 

Where df is a data frame, column3 could be a group by Year variable, and columns 5 to 7 the ones that I want to check repeatedly with the same code.

Do you know how to execute this with dplyr ? If you an alternative without dplyr, I'd like to hear about it.

I've tried to put the character name of the column, but it's not working...

M. Beausoleil
  • 3,141
  • 6
  • 29
  • 61
  • Maybe add a `%>% print` on the end. I'm not really clear on what you're trying to do. Example data might help. – Frank Sep 01 '15 at 15:56
  • Please add some sample data that matches the structure you have in mind and, ideally, an illustration of the desired output. As is, I can't tell if your grouping variable is repeated across rows, if you'll have to deal with missing values, etc. – ulfelder Sep 01 '15 at 15:57
  • 1
    What is your desired output? Do you want objects, one data frame for each column's summary? Using string column names you'll need to use the standard-evaluating `summarise_()`... there's [a whole vignette on the topic](https://cran.rstudio.com/web/packages/dplyr/vignettes/nse.html). Or maybe look into `summarize_each` and get yourself one big summary data frame without any looping. – Gregor Thomas Sep 01 '15 at 15:58
  • Thanks Gregor, that's is exactly the simplest way of doing it! summarise_each(funs(mean, max), column4,column5,column6) http://stackoverflow.com/questions/21644848/summarizing-multiple-columns-with-dplyr – M. Beausoleil Sep 01 '15 at 16:14

2 Answers2

6

How about this:

Fake data:

df <- data.frame(column3=rep(letters[1:2], 10), 
                 column4=rnorm(20),
                 column5=rnorm(20),
                 column6=rnorm(20),
                 column7=rnorm(20))

dplyr solution:

library(dplyr)
df %>% 
  group_by(column3) %>% 
  summarise_each(funs(mean, max), column4:column7)

Output:

Source: local data frame [2 x 9]

  column3 column4_mean column5_mean column6_mean column7_mean column4_max column5_max
1       a     0.186458   0.02662053  -0.00874544    0.3327999    1.563171    2.416697
2       b     0.336329  -0.08868817   0.31777871    0.1934266    1.263437    1.142430
Variables not shown: column6_max (dbl), column7_max (dbl)
Andrew Taylor
  • 3,438
  • 1
  • 26
  • 47
0

This doesn't work because you're calling column names as if they're objects when you have them stored as characters.

I know this can be done with data.table:

dt = data.table(df)
dt[, lapply(.SD, function(x) data.table(mean(x), max(x))),
    by = column3, .SDcols = all.columns]
Señor O
  • 17,049
  • 2
  • 45
  • 47