Using group_by to summarize the data while looping

Question

For example, here is my df:

GP_A <- c(rep("a",3),rep("b",2),rep("c",2))
GP_B <- c(rep("d",2),rep("e",4),rep("f",1))
GENDER <- c(rep("M",4),rep("F",3))
LOC <- c(rep("HK",2),rep("UK",3),rep("JP",2))
SCORE <- c(50,70,80,20,30,80,90)
df <- data.frame(GP_A,GP_B,GENDER,LOC,SCORE)

> df

GP_A GP_B GENDER LOC SCORE
1    a    d      M  HK    50
2    a    d      M  HK    70
3    a    e      M  UK    80
4    b    e      M  UK    20
5    b    e      F  UK    30
6    c    e      F  JP    80
7    c    f      F  JP    90

What I want is:

result[[GP_A]] <- df %>% group_by(GP_A,GENDER,LOC) %>% summarize(SCORE=mean(SCORE))
result[[GP_B]] <- df %>% group_by(GP_B,GENDER,LOC) %>% summarize(SCORE=mean(SCORE))
...

I have tried:

result <- list()
for (i in c("GP_A","GP_B")){
result[[i]] <- df %>% group_by(i,GENDER,LOC) %>% summarize(SCORE=mean(SCORE))
}

Here is the error:

Error: Column I is unknown

I also have tried to use setNames, i.e.

... %>% group_by(setNames(nm=i),GENDER,LOC) %>% ...

But it also doesn't work...

How is this different from your previous question? https://stackoverflow.com/questions/60161831/for-loop-to-summarize-and-joining-by-dplyr — Ronak Shah, Feb 11 '20 at 07:35
@Tung Man Lok just replace `group_by(i,GENDER,LOC)` with `group_by(!!sym(i),GENDER,LOC)`, see [Programming with dplyr](https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html) for more info. — A. Suliman, Feb 11 '20 at 07:56
Is there a specific use-case for `for` loop? Usually `for` loop is not used/preferred when using `dplyr` functions. — Ronak Shah, Feb 11 '20 at 08:39

score 0 · Answer 1 · answered Feb 16 '20 at 10:41

The group_by_at() function allows you to group by string inputs and is probably the best use here.

GP_A <- c(rep("a",3),rep("b",2),rep("c",2))
GP_B <- c(rep("d",2),rep("e",4),rep("f",1))
GENDER <- c(rep("M",4),rep("F",3))
LOC <- c(rep("HK",2),rep("UK",3),rep("JP",2))
SCORE <- c(50,70,80,20,30,80,90)
df <- data.frame(GP_A,GP_B,GENDER,LOC,SCORE)

result <- list()

for(i in c("GP_A","GP_B"))
{
  result[[i]] <- 
    df %>% 
      group_by_at(c(i,"GENDER", "LOC")) %>% 
      summarise(SCORE = mean(SCORE)) %>% 
      ungroup()
}

Remember that it's always best practice to ungroup() your variables once you finish. This is so that in future you don't have unwanted grouping levels.

Using group_by to summarize the data while looping

1 Answers1