0

I want to use filter() from {dplyr} using selection helpers from {tidyselect}. For example, I may run into a situation where I have either of the following three data frames:

df_1 <-
  tribble(~identification_number, ~value,
          1, "a",
          2, "b",
          3, "c",
          4, "d",
          5, "e")

df_2 <-
  tribble(~identification_hash, ~value,
          1, "f",
          2, "g",
          3, "h",
          4, "i",
          5, "j")

df_3 <-
  tribble(~identification_WHATEVER, ~value,
          1, "t",
          2, "u",
          3, "v",
          4, "w",
          5, "x")

If I have the identification values for which I want to filter any given data:

ids_to_keep <- c(2, 4, 5)

then the manual (yet undesired) way is to tackle each one such that:

df_1 %>%
  filter(identification_number %in% ids_to_keep)

df_2 %>%
  filter(identification_hash %in% ids_to_keep)

df_3 %>%
  filter(identification_WHATEVER %in% ids_to_keep)

But since I can't anticipate the exact name that will appear in the data's colname, I need a robust way that says: filter the data according to the column that starts with "identification".

From this answer I found that the following works great:

df_1 %>% # or df_2 or df_3
  filter_at(vars(starts_with("identification")), all_vars(. %in% ids_to_keep))

However, filter_at() is superseded. What is the canonical way to do the same in dplyr v 1.0.7?

Emman
  • 3,695
  • 2
  • 20
  • 44

1 Answers1

2

The _at, _if, and _all functions have been retired. Instead, a new helper function called across has been added that works inside the standard dplyr verbs (filter, mutate, summarize, select). across, in turn, accepts tidyselect syntax, making it easy to select columns programmatically. The new syntax makes it easy to combine single-column operations (as in earlier versions of dplyr) with across operations that affect multiple columns.

Using the "mtcars" data set as an example, you can use across and starts_with. Here I'm selecting the "mpg" column using just its first two letters:

filter(mtcars, across(starts_with('mp'), ~ . > 30))

                mpg cyl disp  hp drat    wt  qsec vs am gear carb
Fiat 128       32.4   4 78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4 75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla 33.9   4 71.1  65 4.22 1.835 19.90  1  1    4    1
Lotus Europa   30.4   4 95.1 113 3.77 1.513 16.90  1  1    5    2

The filter for your data sets would look like:

df_1 %>% filter(across(starts_with("identification"), ~ . %in% ids_to_keep))
jdobres
  • 11,339
  • 1
  • 17
  • 37