I'd like to write functions that use dplyr verbs, which means I have to wade into the murky waters of rlang
.
To provide a concrete example, say I want to use purrr::map_df()
to iterate over variables in a dplyr::group_by()
. The programming with dplyr vignette walks through writing a my_summarise()
function; the approach there is to use rlang::enquo()
on the grouping variable, then unquote with !!
.
This approach works to make a new dplyr-like function that takes unquoted variable names (my_summarise(df, g1)
in the vignette).
In contrast, I want to purrr provide the variable name as a string. Is rlang::sym()
the right way to do this? It seems like it isn't, because sym()
isn't mentioned in the dplyr programming vignette and barely mentioned in the rlang tidy evaluation article. Is there a better way?
library(tidyverse)
my_summarise <- function(df, group_var) {
group_var <- rlang::sym(group_var)
df %>%
group_by(!!group_var) %>%
summarise(mpg = mean(mpg))
}
# This works. Is that a good thing?
purrr::map_df(c("cyl", "am"), my_summarise, df = mtcars)
# A tibble: 5 x 3
cyl mpg am
<dbl> <dbl> <dbl>
1 4.00 26.7 NA
2 6.00 19.7 NA
3 8.00 15.1 NA
4 NA 17.1 0
5 NA 24.4 1.00
As a follow-up, why does simply unquoting (without first applying enquo
or sym
) work some of the time? In the example below, why does select()
work as expected but group_by()
doesn't?
x <- "cyl"
select(mtcars, !!x)
group_by(mtcars, !!x)
Update: the answer is not about unquoting. It's that select
is more flexible and can handle strings, while group_by
can't.
Other ref: This blog post by Edwin Thoen.