How to create new variables based on an external named list/vector of computations dplyr

Question

Imagine I want to do the following operation:

library(dplyr)
mtcars %>%
    group_by(x = am) %>%
    summarize(y = sum(vs),
              q = sum(mpg),
              p = sum(mpg/vs))

which yields:

#> # A tibble: 2 × 4
#>       x     y     q     p
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     0     7  326.   Inf
#> 2     1     7  317.   Inf

However, I would like to do the groupings and the summary based on these two external vectors:

x_groups <- c("x" = "am")
y_now <- c("y" = "vs", "q" = "mpg", "p" = "mpg/vs")

How can I have the same result but through a programmatic, non-standard evaluation approach?

score 4 · Accepted Answer · answered Aug 30 '23 at 15:51

You can parse your strings into expressions. The group by is easy, for the summarize, we need to transform to add the sum. But you can do

grpexpr <- rlang::parse_exprs(x_groups)
sexpr <- rlang::parse_exprs(y_now) |> lapply(function(x) bquote(sum(.(x))))

and since those are named lists, you can inject them into the expression with !!!

mtcars %>%
  group_by(!!!grpexpr) %>%
  summarize(!!!sexpr)

score 1 · Answer 2 · answered Aug 30 '23 at 14:51

There are two ways you can solve this based on your ability to modify the inputs. If you are allowed to create a different input such as a list, I would opt for approach 1.

Approach 1:

Modify y_now by instead creating a list showing the computations you will need and defuse the expressions by wrapping them with rlang::expr(). Then modify the code in group_by and summarise to allow for external inputs. := notation in group_by for naming, and !!! for evaluation of defused expressions. This is how it would look like:

x_groups <- c("x" = "am")
y_now <- list(y = rlang::expr(sum(vs)), q = rlang::expr(sum(mpg)), p = rlang::expr(sum(mpg/vs)))
mtcars %>% 
  group_by(!!sym(names(x_groups)) := !!as.name(x_groups)) %>% 
  summarise(!!!y_now)
#> # A tibble: 2 × 4
#>       x     y     q     p
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     0     7  326.   Inf
#> 2     1     7  317.   Inf

Approach 2:

In this case, you cannot create a different input but work with what you've been given. So you should transform it into the same object as the list y_now of approach 1, in order to do that you should transform the vector into a list and then turn the expressions into a call. Then apply the same non-standard evaluation expressions as in Approach 1.

x_groups <- c("x" = "am")
y_now  <- c("y" = "vs", "q" = "mpg", "p" = "mpg / vs")
y_now <- as.list(y_now) %>% 
  purrr::map(\(variable) str2lang(paste0("sum(", variable, ")")))
mtcars %>% 
  group_by(!!sym(names(x_groups)) := !!as.name(x_groups)) %>% 
  summarise(!!!y_now)
#> # A tibble: 2 × 4
#>       x     y     q     p
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     0     7  326.   Inf
#> 2     1     7  317.   Inf

For your approach 1, it would be easier to define `y_now <- rlang::exprs(y = sum(vs), q = sum(mpg), p = sum(mpg/vs))` so you don't have to repeat `rlang::expr` as often. Also that solution would not work if you had more than one group, ie `x_groups <- c("x" = "am", "z" = "vs")`. The solution I posted would work in that case as well. — MrFlick, Aug 30 '23 at 20:07

TarJae · Answer 3 · 2023-08-30T15:37:16.657

To methodically follow the process (though it may not adhere strictly to the DRY principle):

Generally we uss !! bang-bang operator to unquote, which is the basis of non-standard evaluation (NSE) within tidyverse functions.

With !!names(y_now)[x] := sum(!!sym(y_now[[x]]))we create a column with the name of the x element in the list of y_now (here vs).

The issue arises when it comes to element 3 of y_now: there is no column mpg/vs, therefore we use here: sum(eval(parse(text = y_now[[3]])))

library(dplyr)
library(rlang)

x_groups <- c("x" = "am")
y_now <- c("y" = "vs", "q" = "mpg", "p" = "mpg/vs")


mtcars %>% 
  group_by(!!sym(x_groups[[1]])) %>% 
  summarize(
    !!names(y_now)[1] := sum(!!sym(y_now[[1]])),
    !!names(y_now)[2] := sum(!!sym(y_now[[2]])),
    !!names(y_now)[3] := sum(eval(parse(text = y_now[[3]])))
  )

# A tibble: 2 x 4
     am     y     q     p
  <dbl> <dbl> <dbl> <dbl>
1     0     7  326.   Inf
2     1     7  317.   Inf

This works, but the grouping column is called "am", whereas the original intention of the code is to create a new column called "x". The `summarise` part seems fine, but it doesn't feel programmatic enough if you were to be blind about where `mpg/vs` appears. — Alberto Agudo Dominguez, Aug 31 '23 at 06:57
I see. I think Mr.Flick's solution is the one to go with here! — TarJae, Aug 31 '23 at 07:07

How to create new variables based on an external named list/vector of computations dplyr

3 Answers3

Approach 1:

Approach 2:

Linked