1

I am using dplyr v1.0.2 to manipulate tibbles. I would like to use group_by(), using a function or a regular expression to specify the relevant variable names (the ... argument). The only solution that I've found is clunky. Is there a relatively simple way?

Here is a minimal example that demonstrates the problem:

library(dplyr)
data(iris)
iris[, -(rbinom(1, 1, .5) + 1) ] %>%  # randomly drop "Sepal.Length" or "Sepal.Width"
  group_by(matches("^Sepal\\."))

In the third line, I randomly drop one of the two "Sepal" columns. In the last line, I want to group by the remaining "Sepal" column. The problem is that I don't know its name: it could be either "Sepal.Length" or "Sepal.Width." And the group_by() command in the last line doesn't work: it predictably returns a matches() must be used within a *selecting* function error message.

By contrast, this code works, but it is a bit clunky:

iris[, -(rbinom(1, 1, .5) + 1) ]  %>%
  group_by(!!as.name(grep('Sepal', colnames(.), val = TRUE)))

Is there a simpler way to do the grouping on the second line?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
user697473
  • 2,165
  • 1
  • 20
  • 47

1 Answers1

1

What about using across to select the columns

iris[, -(rbinom(1, 1, .5) + 1) ]  %>%
  group_by(across(starts_with('Sepal')))

# A tibble: 150 x 4
# Groups:   Sepal.Length [35]
   Sepal.Length Petal.Length Petal.Width Species
          <dbl>        <dbl>       <dbl> <fct>  
 1          5.1          1.4         0.2 setosa 
 2          4.9          1.4         0.2 setosa 
 3          4.7          1.3         0.2 setosa 
 4          4.6          1.5         0.2 setosa 
 5          5            1.4         0.2 setosa 
 6          5.4          1.7         0.4 setosa 
 7          4.6          1.4         0.3 setosa 
 8          5            1.5         0.2 setosa 
 9          4.4          1.4         0.2 setosa 
10          4.9          1.5         0.1 setosa 
# … with 140 more rows
Agaz Wani
  • 5,514
  • 8
  • 42
  • 62
  • 1
    Thank you. I didn't realize that `across()` can work easily with ordinary functions like `grep()`, too, but it does: `iris[, -(rbinom(1, 1, .5) + 1) ] %>% group_by(across(grep('^Sepal', ., val = TRUE)))` produces the same result. This is still not as clean as I would like, but it's a step in the right direction. – user697473 Dec 22 '20 at 16:01
  • @user697473 when you say not *not as cleen as i would like* what exactly do you want? – Onyambu Dec 22 '20 at 16:51
  • Something that doesn't require three closing parentheses. Ideally, a single function inside `group_by()`, rather than one function nested in another. I doubt that it's possible, but perhaps I'm wrong about that. – user697473 Dec 22 '20 at 17:10