specify variable names when grouping

Question

I am using dplyr v1.0.2 to manipulate tibbles. I would like to use group_by(), using a function or a regular expression to specify the relevant variable names (the ... argument). The only solution that I've found is clunky. Is there a relatively simple way?

Here is a minimal example that demonstrates the problem:

library(dplyr)
data(iris)
iris[, -(rbinom(1, 1, .5) + 1) ] %>%  # randomly drop "Sepal.Length" or "Sepal.Width"
  group_by(matches("^Sepal\\."))

In the third line, I randomly drop one of the two "Sepal" columns. In the last line, I want to group by the remaining "Sepal" column. The problem is that I don't know its name: it could be either "Sepal.Length" or "Sepal.Width." And the group_by() command in the last line doesn't work: it predictably returns a matches() must be used within a *selecting* function error message.

By contrast, this code works, but it is a bit clunky:

iris[, -(rbinom(1, 1, .5) + 1) ]  %>%
  group_by(!!as.name(grep('Sepal', colnames(.), val = TRUE)))

Is there a simpler way to do the grouping on the second line?

score 1 · Accepted Answer · answered Dec 22 '20 at 15:57

1

What about using across to select the columns

iris[, -(rbinom(1, 1, .5) + 1) ]  %>%
  group_by(across(starts_with('Sepal')))

# A tibble: 150 x 4
# Groups:   Sepal.Length [35]
   Sepal.Length Petal.Length Petal.Width Species
          <dbl>        <dbl>       <dbl> <fct>  
 1          5.1          1.4         0.2 setosa 
 2          4.9          1.4         0.2 setosa 
 3          4.7          1.3         0.2 setosa 
 4          4.6          1.5         0.2 setosa 
 5          5            1.4         0.2 setosa 
 6          5.4          1.7         0.4 setosa 
 7          4.6          1.4         0.3 setosa 
 8          5            1.5         0.2 setosa 
 9          4.4          1.4         0.2 setosa 
10          4.9          1.5         0.1 setosa 
# … with 140 more rows

answered Dec 22 '20 at 15:57

Agaz Wani

5,514
8
42
62

1

Thank you. I didn't realize that `across()` can work easily with ordinary functions like `grep()`, too, but it does: `iris[, -(rbinom(1, 1, .5) + 1) ] %>% group_by(across(grep('^Sepal', ., val = TRUE)))` produces the same result. This is still not as clean as I would like, but it's a step in the right direction. – user697473 Dec 22 '20 at 16:01
@user697473 when you say not *not as cleen as i would like* what exactly do you want? – Onyambu Dec 22 '20 at 16:51
Something that doesn't require three closing parentheses. Ideally, a single function inside `group_by()`, rather than one function nested in another. I doubt that it's possible, but perhaps I'm wrong about that. – user697473 Dec 22 '20 at 17:10

specify variable names when grouping

1 Answers1