-3

For example, here is my df:

GP_A <- c(rep("a",3),rep("b",2),rep("c",2))
GP_B <- c(rep("d",2),rep("e",4),rep("f",1))
GENDER <- c(rep("M",4),rep("F",3))
LOC <- c(rep("HK",2),rep("UK",3),rep("JP",2))
SCORE <- c(50,70,80,20,30,80,90)
df <- data.frame(GP_A,GP_B,GENDER,LOC,SCORE)

> df

GP_A GP_B GENDER LOC SCORE
1    a    d      M  HK    50
2    a    d      M  HK    70
3    a    e      M  UK    80
4    b    e      M  UK    20
5    b    e      F  UK    30
6    c    e      F  JP    80
7    c    f      F  JP    90

What I want is:

result[[GP_A]] <- df %>% group_by(GP_A,GENDER,LOC) %>% summarize(SCORE=mean(SCORE))
result[[GP_B]] <- df %>% group_by(GP_B,GENDER,LOC) %>% summarize(SCORE=mean(SCORE))
...

I have tried:

result <- list()
for (i in c("GP_A","GP_B")){
result[[i]] <- df %>% group_by(i,GENDER,LOC) %>% summarize(SCORE=mean(SCORE))
}

Here is the error:

Error: Column I is unknown

I also have tried to use setNames, i.e.

... %>% group_by(setNames(nm=i),GENDER,LOC) %>% ...

But it also doesn't work...

A. Suliman
  • 12,923
  • 5
  • 24
  • 37
xxx
  • 53
  • 1
  • 6
  • How is this different from your previous question? https://stackoverflow.com/questions/60161831/for-loop-to-summarize-and-joining-by-dplyr – Ronak Shah Feb 11 '20 at 07:35
  • @Tung Man Lok just replace `group_by(i,GENDER,LOC)` with `group_by(!!sym(i),GENDER,LOC)`, see [Programming with dplyr](https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html) for more info. – A. Suliman Feb 11 '20 at 07:56
  • @RonakShah I need a for-loop answer for my actual df. – xxx Feb 11 '20 at 08:26
  • 1
    Is there a specific use-case for `for` loop? Usually `for` loop is not used/preferred when using `dplyr` functions. – Ronak Shah Feb 11 '20 at 08:39

1 Answers1

0

The group_by_at() function allows you to group by string inputs and is probably the best use here.

GP_A <- c(rep("a",3),rep("b",2),rep("c",2))
GP_B <- c(rep("d",2),rep("e",4),rep("f",1))
GENDER <- c(rep("M",4),rep("F",3))
LOC <- c(rep("HK",2),rep("UK",3),rep("JP",2))
SCORE <- c(50,70,80,20,30,80,90)
df <- data.frame(GP_A,GP_B,GENDER,LOC,SCORE)

result <- list()

for(i in c("GP_A","GP_B"))
{
  result[[i]] <- 
    df %>% 
      group_by_at(c(i,"GENDER", "LOC")) %>% 
      summarise(SCORE = mean(SCORE)) %>% 
      ungroup()
}

Remember that it's always best practice to ungroup() your variables once you finish. This is so that in future you don't have unwanted grouping levels.

Griff
  • 11
  • 3