Question on R programming with dplyr and tidy evaluation

Question

Folks I have a couple of questions about how tidy evaluation works with dplyr

The following code produces a tally of cars by cylinder using the mtcars dataset:

mtcars %>%
  select(cyl) %>%
  group_by(cyl) %>%
  tally()

With output as expected:

# A tibble: 3 x 2
    cyl     n
* <dbl> <int>
1     4    11
2     6     7
3     8    14

If I want to pass the grouping factor as variable, then this fails:

var <- "cyl"

mtcars %>%
  select(var) %>%
  group_by(var) %>%
  tally()

with error message:

Error: Must group by variables found in `.data`.
* Column `var` is not found.

This also fails:

var <- "cyl"

mtcars %>%
  select(var) %>%
  group_by({{ var}}) %>%
  tally()

Producing output:

# A tibble: 1 x 2
  `"cyl"`     n
* <chr>   <int>
1 cyl        32

This code, however, works as expected:

var <- "cyl"

mtcars %>%
  select(var) %>%
  group_by(.data[[ var]]) %>%
  tally()

Producing the expected output:

# A tibble: 3 x 2
    cyl     n
* <dbl> <int>
1     4    11
2     6     7
3     8    14

I have two questions about this and wondering if someone can help!

Why does select(var) work fine without using any of the dplyr tidy evaluation extensions, such as select({{ var }}) or select(.data[[ var ]])?
What is is about group_by() that makes group_by({{ var }}) wrong but group_by(.data[[ var ]]) right?

Thanks so much!

Matt.

score 1 · Answer 1 · answered Jun 29 '21 at 13:09

It depends on how those functions work and accept input.

If you look at the documentation at ?select the relevant part for this question is -

These helpers select variables from a character vector:

all_of(): Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.

any_of(): Same as all_of(), except that no error is thrown for names that don't exist.

So you can use all_of and any_of in select with character vectors hence you get a warning when you run mtcars %>% select(var)

Note: Using an external vector in selections is ambiguous. ℹ Use all_of(var) instead of var to silence this message.

and no warning with mtcars %>% select(all_of(var)).

As far as group_by is concerned there is no such specific provision and you need to use mtcars %>% group_by(.data[[var]]).

And the FAQ for tidyselect mentions "This note will become a warning in the future, and then an error. We have decided to deprecate this particular approach to using external vectors because they introduce ambiguity. Imagine that the data frame contains a column with the same name as your external variable." — SmokeyShakers, Jun 29 '21 at 13:11

1 Answers1