Folks I have a couple of questions about how tidy evaluation works with dplyr
The following code produces a tally of cars by cylinder using the mtcars dataset:
mtcars %>%
select(cyl) %>%
group_by(cyl) %>%
tally()
With output as expected:
# A tibble: 3 x 2
cyl n
* <dbl> <int>
1 4 11
2 6 7
3 8 14
If I want to pass the grouping factor as variable, then this fails:
var <- "cyl"
mtcars %>%
select(var) %>%
group_by(var) %>%
tally()
with error message:
Error: Must group by variables found in `.data`.
* Column `var` is not found.
This also fails:
var <- "cyl"
mtcars %>%
select(var) %>%
group_by({{ var}}) %>%
tally()
Producing output:
# A tibble: 1 x 2
`"cyl"` n
* <chr> <int>
1 cyl 32
This code, however, works as expected:
var <- "cyl"
mtcars %>%
select(var) %>%
group_by(.data[[ var]]) %>%
tally()
Producing the expected output:
# A tibble: 3 x 2
cyl n
* <dbl> <int>
1 4 11
2 6 7
3 8 14
I have two questions about this and wondering if someone can help!
Why does
select(var)
work fine without using any of the dplyr tidy evaluation extensions, such asselect({{ var }})
orselect(.data[[ var ]])
?What is is about
group_by()
that makesgroup_by({{ var }})
wrong butgroup_by(.data[[ var ]])
right?
Thanks so much!
Matt.