How can I use dplyr across() programmatically on no variables?

Question

Issue:

I want to use across() programmatically so that if, e.g. NULL or an empty string is passed to it, the function won't fail. This is possibly using scoped variants of functions such as group_by_at(), but I'd like to make it work neatly (i.e. without if-statements) using across().

Note also that currently across() will affect all columns if left empty. I'm unsure what the motivation for this is; to me it would make more sense if no columns were affected.

Example

Here's a quick example using functions to calculate the mean of a variable y. Passing a grouping variable works with group_by_at(), but not with across() as shown:

my_df <- tibble("x" = c("a", "a", "b", "b"), y = 1:4)

compute_mean1 <- function(df, grouping) { # compute grouped mean with across()
  df %>% 
    group_by(across(all_of(grouping))) %>% 
    summarise(y = mean(y), .groups = "drop")
}

compute_mean2 <- function(df, grouping) { # compute grouped mean with group_by_at()
  df %>% 
    group_by_at(grouping) %>% 
    summarise(y = mean(y), .groups = "drop")
}


compute_mean1(my_df, "x")
#> # A tibble: 2 x 2
#>   x         y
#>   <chr> <dbl>
#> 1 a       1.5
#> 2 b       3.5
compute_mean1(my_df, NULL)
#> Error: `vars` must be a character vector.
compute_mean2(my_df, "x")
#> # A tibble: 2 x 2
#>   x         y
#>   <chr> <dbl>
#> 1 a       1.5
#> 2 b       3.5
compute_mean2(my_df, NULL)
#> # A tibble: 1 x 1
#>       y
#>   <dbl>
#> 1   2.5

^{Created on 2020-07-14 by the reprex package (v0.3.0)}

score 5 · Accepted Answer · answered Jul 14 '20 at 17:57

5

Use .add=TRUE like this:

compute_mean3 <- function(df, grouping) { # compute grouped mean with across()
  df %>% 
    group_by(across(all_of(grouping)), .add = TRUE) %>%
    summarise(y = mean(y), .groups = "drop")
}

answered Jul 14 '20 at 17:57

G. Grothendieck

254,981
17
203
341

I upvoted the answer but if you'll permit me two questions. #1 Definitely works for `grouping = NULL` but not `grouping = ""` is that possible to overcome? #2 How on earth did you deduce that from the doco? `.add` says **"will override existing groups. To add to the existing groups, use .add = TRUE"** there are no existing groups – Chuck P Jul 14 '20 at 18:40
1

I just played around with it to find how it works so it is possible that what is shown in the answer is not really supported but just happens to work. Suggest you open a github issue.Regarding the use of a zero length string to denote no columns check the argument and convert it to NULL prior to the pipeline if you want to support that. – G. Grothendieck Jul 14 '20 at 19:05
Thank you, not sure it needs to be supported just trying to learn and appreciate your response. I used to rely on all the _if, _at etc. variants. `across` is still weird to me. – Chuck P Jul 14 '20 at 20:15
I am not really sure if across itself is officially supported in group_by. In the across help page it says it works in mutate and select. There is no mention of group_by . – G. Grothendieck Jul 15 '20 at 02:43
The reference page for `across` is ambiguous I'd say. It only specifically mentions `summarise` and `mutate`, but also says it supersedes scoped variants (ending with `_at`, `_if`, and `_all`) , which obviously `group_by` has. – wurli Jul 15 '20 at 08:29
1

I have added issues 5407 and 5408 to the dplyr github page. – G. Grothendieck Jul 15 '20 at 12:37

How can I use dplyr across() programmatically on no variables?

Issue:

Example

1 Answers1