0

I'd like to exclude columns of a tibble. I got an exclude expression: a mix of possible column names and tidyselect expressions. This is what I tried:

library(tidyverse)
library(rlang)

# my vector of columns and tidyselect expressions
exclude_expression <- c('-name', '-ends_with("_x")', '-id')

# dummy dataframe
# note: column "id" does not exist in the tibble
dat <- 
  tribble(
    ~name,  ~coord_x, ~coord_y,
    "ben",  "1",      "2",
    "anna", "3",      "4"
  )

# select statement where columns should be excluded, if they are present
dat %>% 
  select(
    !!!parse_exprs(exclude_expression)
  )
#> Error: Can't subset columns that don't exist.
#> x Column `id` doesn't exist.

Created on 2022-06-03 by the reprex package (v2.0.0)

Importantly, the pipe should not fail, if a column does not exist (in contrast to my example). The expected output for my example is:

# A tibble: 2 x 1
  coord_y
  <chr>  
1 2      
2 4
piptoma
  • 754
  • 1
  • 8
  • 19
  • Can you just put any bare variable names that may or may not exist inside of `any_of()`? i.e., `c('-any_of("name")', '-ends_with("_x")', '-any_of("id")')` – lhs Jun 03 '22 at 14:15
  • This gives error: `dat %>% select(-name, -ends_with("_x"), -id)`. Consequently, the problem is not in passing the expressions to `select`. – PaulS Jun 03 '22 at 14:16
  • This is just a minimal example. In reality the `select()` is used in a loop, and the exclude expression changes for every iteration. – piptoma Jun 03 '22 at 14:18
  • 1
    Why use `select` in a for_loop? Why not just use `group_by` and solve the problem? Your unless you are doing some simulation, avoid for-loops when using tidyverse – Onyambu Jun 03 '22 at 14:20
  • `dat %>% select(-any_of(c("name", "id")), -ends_with("_x"))` would work, but I want to have the select() statement fixed. – piptoma Jun 03 '22 at 14:20
  • Don't worry about the structure of the code, it is a minimal example. – piptoma Jun 03 '22 at 14:21
  • 1
    Just `select(dat, -id)` alone doesn't work. So working with the vector as you have it just won't work. You'd need to parse each expression and turn it into something valid. Where is this string of character expressions coming from? This seems like an [XY Problem](https://xyproblem.info/) and the questions focuses on trying to patch your attempted solution rather than focusing in the real problem you are trying to solve. – MrFlick Jun 03 '22 at 15:11
  • 2
    If `dat %>% select(-any_of(c("name", "id")), -ends_with("_x"))` works for the example but not for your actual problem, then your example is not actually a reproducible example. You should add more detail to show why this wouldn't work in your actual code. – Marcus Jun 03 '22 at 16:07

1 Answers1

1

It feels like there is more to the problem that what is being described, but at the essence of "the pipe should not fail, if a column does not exist" you need to use tidyselect helper any_of. Almost all of these will take a character vector as input. These variables could then be redefined for each iteration (as per the comments).

cols_that_exist <- c("name")
cols_might_exist <- c("id")

dat |>
  select(
    -contains(cols_that_exist), -ends_with('_x'), -any_of(cols_might_exist)
  )

contains does leave room for partial matching. You could then instead use an enquosure with tidyeval

dat |>
  select(
    -!!enquo(cols_that_exist), -ends_with('_x'), -any_of(cols_might_exist)
  )

This should leave your select statement "fixed" but allow you to update the criteria variables

Marcus
  • 3,478
  • 1
  • 7
  • 16