What is the process of applying a dplyr function to a list of values

Question

I have created a dplyr function to evaluate counts of events for a population. The code works when used with explicit naming of variables within the dplyr::filter and dplyr::group_by functions.

I need to apply the function to 24 variables which are column headers within a data frame. Here they are referred to as x.

I have used !! as I understand that the variable is evaluated as a string rather than a column name.

The function

summary_table <- function(x){
  assign(paste(x,"sum_tab", sep="_"),
         envir = parent.frame(),
         value = df %>%
  filter(!is.na(!!x)) %>%
  group_by(!!x) %>%
  summarise(
           'Variable name' = paste0(x),
            Discharged = sum(admission_status == "Discharged"),
           'Re-attended' = sum(!is.na(re_admission_status)),
           'Admitted on Re-attendance' = sum(re_admission_status == "Admitted", na.rm = TRUE)))
}

I have used:

sapply(var_names, summary_table)

However this only outputs one row of the table for each variable in the list var_names

In summary I would like pointers to the correct mechanism to apply the function written above to a list of column names within the dplyr pipe.

Reproducible example

example <- mtcars %>%
  group_by(vs) %>%
  summarise(
    '6 cylinder' = sum(cyl == 6),
    'Large disp' = sum(disp >= 100),
    'low gears' = sum(gear <= 4))
})

In this example we would want to apply this function to the following list

cars_var <- c("vm", "am", "carb")

This would produce three tables for each column in the list.

Can you provide a data sample that works with your code? That will make it much easier to understand what your code is attempting to do and develop a solution. What did you intend the `assign` statement to be doing? The `assign` step is probably unnecessary and undesirable. The `!!` is "unquoting" the `x` argument. But for that to work, `x` first has to be "quoted" (or "quasiquoted" in this case) by doing `x = enquo(x)` at the beginning of the function. See the [programming with `dplyr`](https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html) for more info on this. — eipi10, Apr 23 '19 at 22:54
For example, [here's a recent answer](https://stackoverflow.com/a/55426189/496488) I wrote that uses `enquo` and `!!` in a function. — eipi10, Apr 23 '19 at 22:57
Sorry for delay in replying. I was using assign in the function because I wanted to output a named variable for each loop of the function. I am unsure whether I have implemented this correctly. I have looked over your post and the chapter by wickham. There seems to be a contradiction as to whether I should use x = enqote(var_names) and then !!x or enquote(...) and then call with !!! — hisspott, Apr 24 '19 at 11:49

nacnudus · Accepted Answer · 2019-05-02T19:42:24.580

As @eipi10 commented, it is usually unwise to automatically create variables. A better idea is to create a single variable that is a list of data frames.

It is also easier to let users apply the groups themselves with group_by() or group_by_at(), so that you don't have to worry about how they provide the names of the variables.

EDIT 2019-05-2

One way is to regard the names of the grouping variables as the 'data', and map over them, creating a copy of the actual data grouped by each one of the grouping variables.

library(dplyr)
library(purrr)

grouping_vars <- c("vs", "am", "carb")
map(grouping_vars, group_by_at, .tbl = mtcars) %>%
  map(summarise,
      '6 cylinder' = sum(cyl == 6),
      'Large disp' = sum(disp >= 100),
      'low gears' = sum(gear <= 4))
#> [[1]]
#> # A tibble: 2 x 4
#>      vs `6 cylinder` `Large disp` `low gears`
#>   <dbl>        <int>        <int>       <int>
#> 1     0            3           18          14
#> 2     1            4            9          13
#> 
#> [[2]]
#> # A tibble: 2 x 4
#>      am `6 cylinder` `Large disp` `low gears`
#>   <dbl>        <int>        <int>       <int>
#> 1     0            4           19          19
#> 2     1            3            8           8
#> 
#> [[3]]
#> # A tibble: 6 x 4
#>    carb `6 cylinder` `Large disp` `low gears`
#>   <dbl>        <int>        <int>       <int>
#> 1     1            2            4           7
#> 2     2            0            8           8
#> 3     3            0            3           3
#> 4     4            4           10           9
#> 5     6            1            1           0
#> 6     8            0            1           0

^{Created on 2019-05-02 by the reprex package (v0.2.1)}

Original answer

Here is a function that uses dplyr::groups() to find out which variables have been grouped. Then it iterates over each grouping variable, summarises, and appends the resulting data frame to a list.

library(dplyr)

margins <- function(.data, ...) {
  groups <- dplyr::groups(.data)
  n <- length(groups)
  out <- vector(mode = "list", length = n)
  for (i in rev(seq_len(n))) {
    out[[i]] <-
      .data %>%
      dplyr::group_by(!!groups[[i]]) %>%
      dplyr::summarise(...) %>%
      dplyr::group_by(!!groups[[i]]) # Reapply the original group
  }
  out
}

mtcars %>%
  group_by(vs, am, carb) %>%
  margins('6 cylinder' = sum(cyl == 6),
          'Large disp' = sum(disp >= 100),
          'low gears' = sum(gear <= 4))
#> [[1]]
#> # A tibble: 2 x 4
#> # Groups:   vs [2]
#>      vs `6 cylinder` `Large disp` `low gears`
#>   <dbl>        <int>        <int>       <int>
#> 1     0            3           18          14
#> 2     1            4            9          13
#> 
#> [[2]]
#> # A tibble: 2 x 4
#> # Groups:   am [2]
#>      am `6 cylinder` `Large disp` `low gears`
#>   <dbl>        <int>        <int>       <int>
#> 1     0            4           19          19
#> 2     1            3            8           8
#> 
#> [[3]]
#> # A tibble: 6 x 4
#> # Groups:   carb [6]
#>    carb `6 cylinder` `Large disp` `low gears`
#>   <dbl>        <int>        <int>       <int>
#> 1     1            2            4           7
#> 2     2            0            8           8
#> 3     3            0            3           3
#> 4     4            4           10           9
#> 5     6            1            1           0
#> 6     8            0            1           0

^{Created on 2019-04-24 by the reprex package (v0.2.1.9000)}

If you want to group with a vector of variable names, you can use dplyr::group_by_at() and dplyr::vars().

cars_var <- c("vs", "am", "carb")

mtcars %>%
  group_by_at(vars(cars_var)) %>%
  margins('6 cylinder' = sum(cyl == 6),
          'Large disp' = sum(disp >= 100),
          'low gears' = sum(gear <= 4))

I am the author of a small package called armgin that implements this and a few similar ideas.

Thanks, this worked perfectly with my data set and I was able to extract 30+ summary tables without duplicating code. — hisspott, Apr 24 '19 at 21:59
thanks I thought that there would be a map solution version. That's great. — hisspott, May 03 '19 at 12:56

What is the process of applying a dplyr function to a list of values

1 Answers1