Using purrr and dplyr: is rlang::sym the best way

Question

I'd like to write functions that use dplyr verbs, which means I have to wade into the murky waters of rlang.

To provide a concrete example, say I want to use purrr::map_df() to iterate over variables in a dplyr::group_by(). The programming with dplyr vignette walks through writing a my_summarise() function; the approach there is to use rlang::enquo() on the grouping variable, then unquote with !!. This approach works to make a new dplyr-like function that takes unquoted variable names (my_summarise(df, g1) in the vignette).

In contrast, I want to purrr provide the variable name as a string. Is rlang::sym() the right way to do this? It seems like it isn't, because sym() isn't mentioned in the dplyr programming vignette and barely mentioned in the rlang tidy evaluation article. Is there a better way?

library(tidyverse)
my_summarise <- function(df, group_var) {
  group_var <- rlang::sym(group_var)

  df %>%
    group_by(!!group_var) %>%
    summarise(mpg = mean(mpg))
}

# This works. Is that a good thing?
purrr::map_df(c("cyl", "am"), my_summarise, df = mtcars)

# A tibble: 5 x 3
    cyl   mpg    am
  <dbl> <dbl> <dbl>
1  4.00  26.7 NA   
2  6.00  19.7 NA   
3  8.00  15.1 NA   
4 NA     17.1  0   
5 NA     24.4  1.00

As a follow-up, why does simply unquoting (without first applying enquo or sym) work some of the time? In the example below, why does select() work as expected but group_by() doesn't?

x <- "cyl"
select(mtcars, !!x)
group_by(mtcars, !!x)

Update: the answer is not about unquoting. It's that select is more flexible and can handle strings, while group_by can't.

Other ref: This blog post by Edwin Thoen.

I'm just going to say that recently, using sym() has been solving all my seemingly unsolvable issues with quoting. Sometimes `!!sym()` especially when there is an expression. Also I think it would be better to break this into two questions. — Elin, Jan 06 '18 at 21:11
The draft vignette does use `sym`. http://rpubs.com/lionel-/programming-draft — G. Grothendieck, Jan 06 '18 at 21:57

score 4 · Accepted Answer · answered Feb 14 '18 at 04:46

4

Short answer: yes.

If you want to map over columns, sym is a fine way to do it. Lionel Henry demonstrates sym in the draft vignette.

In cases where you want to pass a column name, but aren't trying to iterate, Kirill Müller prefers quo. In the example below, they have the same effect.

library(dplyr)

x <- rlang::quo(cyl)
y <- rlang::sym("cyl")
identical(group_by(mtcars, !!x), group_by(mtcars, !!y))  # TRUE

answered Feb 14 '18 at 04:46

karldw

361
3
12

3

With rlang >=0.4.0, this has gotten easier. Use `group_by(.data[[x]])` if `x` is a string. https://www.tidyverse.org/blog/2019/06/rlang-0-4-0/ – karldw Feb 16 '20 at 17:26

Using purrr and dplyr: is rlang::sym the best way

1 Answers1

Linked