2

I would like to use case_when from dplyr in order to select a column to change its role for a tidymodels recipe.

What am I doing wrong? In the following MWE an ID-role should be assigned to the column "b":

library(tidyverse)
library(tidymodels)

# dummy data
a = seq(1:3)
b = seq(4:6)
c = seq(7:9)
df <- data.frame(a,b,c)

# filter variable
col_name = "foo"

rec <- recipe(a ~., data = df) %>%
  update_role(
              case_when(
                col_name == "foo" ~ b, # Not working too: .$b, df$b
                col_name == "foo2" ~ c), 
              new_role = "ID")
rec
stefan
  • 90,330
  • 6
  • 25
  • 51
Roland
  • 131
  • 7

2 Answers2

3

Unfortunately case_when is not meant for the kind of dynamic variable selection you are trying to achieve. Instead I would suggest to make use of an if (...) wrapped inside a function to perform the dynamic selection:

library(tidyverse)
library(tidymodels)

# dummy data
a = seq(1:3)
b = seq(4:6)
c = seq(7:9)
df <- data.frame(a,b,c)

# filter variable
col_name = "foo"

update_select <- function(recipe, col_name) {
  if (col_name == "foo") {
    update_role(recipe, b, new_role = "ID") 
  } else if (col_name == "foo2") {
    update_role(recipe, c, new_role = "ID")  
  }
}

rec <- recipe(a ~., data = df) %>%
  update_select(col_name)
rec
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>         ID          1
#>    outcome          1
#>  predictor          1
stefan
  • 90,330
  • 6
  • 25
  • 51
1

There are a couple of different ways to do this. I think with the example you show here, I would use a named vector that has the column names:

library(recipes)

# dummy data
a = seq(1:3)
b = seq(4:6)
c = seq(7:9)
df <- data.frame(a,b,c)

selector_vec <- c("foo" = "b", "foo2" = "c")

## could select more than one term here
my_terms <- selector_vec[["foo"]]
rec1 <- recipe(a ~ ., data = df) %>%
  update_role(all_of(my_terms), new_role = "ID")
prep(rec1)$term_info
#> # A tibble: 3 x 4
#>   variable type    role      source  
#>   <chr>    <chr>   <chr>     <chr>   
#> 1 b        numeric ID        original
#> 2 c        numeric predictor original
#> 3 a        numeric outcome   original

my_terms <- selector_vec[["foo2"]]
rec2 <- recipe(a ~ ., data = df) %>%
  update_role(all_of(my_terms), new_role = "ID")
prep(rec2)$term_info
#> # A tibble: 3 x 4
#>   variable type    role      source  
#>   <chr>    <chr>   <chr>     <chr>   
#> 1 b        numeric predictor original
#> 2 c        numeric ID        original
#> 3 a        numeric outcome   original

Created on 2021-05-24 by the reprex package (v2.0.0)

In what might be considered a more realistic situation, I would use across() as shown here.

Julia Silge
  • 10,848
  • 2
  • 40
  • 48