Replacing group_by_at(NULL) using across

Question

Before, I used group_by_at to group by a vector of strings or by NULL:

library(tidyverse)

grouping_1 <- c("cyl", "vs")
grouping_2 <- NULL

mtcars %>% group_by_at(grouping_1) 
mtcars %>% group_by_at(grouping_2)

The help of group_by_at indicates that the function is superseded and that across should be used instead. But, grouping by NULL gives an error

mtcars %>% group_by(across(grouping_1)) # this works
mtcars %>% group_by(across(grouping_2)) # this gives an error

For me, group_by_at used in the way described has been useful because in my functions I can use the same code without checking every time whether the grouping argument is empty (NULL) or not.

From `across` documentation: `across()` makes it easy to apply the same transformation to multiple columns, allowing you to use `select()` semantics inside in `summarise()` and `mutate()`. Thus I'm not sure that you can use `across` within a `group_by` statement.. — Ric S, Jun 09 '20 at 09:23
This if from the documentation of group_by_all: "Scoped verbs (_if, _at, _all) have been superseded by the use of across() in an existing verb. See vignette("colwise") for details." — danilinares, Jun 09 '20 at 11:58
Also all the examples in the help of group_by_all are examples using across to replace superseded functions. — danilinares, Jun 09 '20 at 12:14
You are right, I didn't check the function `group_by_all`. Thank you for making me notice it — Ric S, Jun 09 '20 at 12:30

TimTeaFan · Answer 1 · 2021-05-30T20:42:40.017

It is still ok to use syms to splice strings into group_by using !!!.

library(tidyverse)

grouping_1 <- c("cyl", "vs")
grouping_2 <- NULL

sym_gr_1 <- syms(grouping_1)
sym_gr_2 <- syms(grouping_2)

mtcars %>% group_by(!!! sym_gr_1) # this works

#> # A tibble: 32 x 11
#> # Groups:   cyl, vs [5]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows


mtcars %>% group_by(!!! sym_gr_2) # this works

#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows

^{Created on 2020-06-20 by the reprex package (v0.3.0)}

Using dplyr::across() another option (on top of the official way to do with all_of as posted in the answer below) is to wrap the strings containing the variable names in c(). This even works, when the object is NULL. However, results in a note, reminding use to better use all_of.

grouping_1 <- c("cyl", "vs")
grouping_2 <- NULL

mtcars %>% group_by(across(c(grouping_1))) 

#> Note: Using an external vector in selections is ambiguous.
#> ℹ Use `all_of(grouping_1)` instead of `grouping_1` to silence this message.
#> ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.

#> # A tibble: 32 x 11
#> # Groups:   cyl, vs [5]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows
mtcars %>% group_by(across(c(grouping_2))) 

#> Note: Using an external vector in selections is ambiguous.
#> ℹ Use `all_of(grouping_2)` instead of `grouping_2` to silence this message.
#> ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.

#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows

^{Created on 2021-05-30 by the reprex package (v0.3.0)}

score 1 · Answer 2 · answered Aug 01 '20 at 09:01

1

Using all_of:

library(tidyverse)

mtcars %>% group_by(across(all_of(grouping_1))) # this works
mtcars %>% group_by(across(all_of(grouping_2))) # this works

answered Aug 01 '20 at 09:01

danilinares

1,172
1
9
28

I copied it from here: https://github.com/tidyverse/dplyr/issues/5316 – danilinares Aug 01 '20 at 09:02
In some tests that I tried the code using "syms" were faster than using "across". – danilinares Aug 01 '20 at 09:07

Replacing group_by_at(NULL) using across

2 Answers2