Iterate Group_by across Dataframe in R

Question

I'm trying to simplify a current piece of code in my script.

I want to group by each possible combination of two categorical variables and summarise a mean value of my explanatory variable.

Example using mpg database found in ggplot2;

library(tidyverse)

   mpg %>% group_by(manufacturer, model) %>% summarise(mean = mean(hwy))
   mpg %>% group_by(manufacturer, year) %>% summarise(mean = mean(hwy))
   mpg %>% group_by(manufacturer, cyl) %>% summarise(mean = mean(hwy))

(this would continue until all combination of categorical variables - columns is done)

mpg %>% group_by(cyl, year) %>% summarise(mean = mean(hwy))

etc...

My actual database has hundreds of categorical variables so I would like to iterate the process in a for loop or using purrr for example.

Thanks

score 1 · Accepted Answer · answered Nov 13 '19 at 12:10

1

This uses purrr to select character and factor columns and then combn() to select all of the combinations.

library(ggplot2)
library(purrr)
library(dplyr)

map_lgl(mpg, ~ is.character(.) | is.factor(.))%>%
  names(.)[.]%>%
  combn(2, function(x) {mpg%>%group_by_at(x)%>%summarize(mean = mean(hwy))}, simplify = F)

Note, this can become messy as choose(100,2) evaluates to 4,950 combinations.

answered Nov 13 '19 at 12:10

Cole

11,130
1
9
24

Thanks I intend to filter the list of data frames created based on certain factors e.g max mean value in dataframe to make it more manageable. – JmezR Nov 13 '19 at 13:58

Iterate Group_by across Dataframe in R

1 Answers1