1

How do I keep a character ID variable PERSON_ID unchanged in a recipe? I tried update_role(PERSON_ID , new_role = "id variable") and tried excluding it from step_dummy step_dummy(all_nominal_predictors(), -all_numeric_predictors(), -all_outcomes(), -has_role(match = "id variable"). It does not work. It still converts PERSON_ID to factor. Any suggestion?

poshan
  • 3,069
  • 5
  • 20
  • 30
  • 1
    Please consider providing a small reproducible example – akrun Mar 04 '22 at 15:30
  • 1
    The conversion from character to factor already happens when you put your data into the recipe function, before you are adding any steps. Even if you step_mutate the variable to as.character, it will still be converted to a factor – Leonhard Geisler Mar 04 '22 at 22:31
  • Thanks @LeonhardGeisler. I am trying to create a workflowset and the factor id variable is blowing up the memory. Any suggestion how to handle it? – poshan Mar 04 '22 at 22:44

1 Answers1

0

This seems to be a confusing one. Following the recipe function documentation, step_factor2string should convert factors to strings.

However, when you glimpse at the recipe it states "fct" for PERSON_ID. On the other side an error appears, if you set strings_as_factors to FALSE, stating that PERSON_ID is not a factor:

library(tibble)
library(tidymodels)

data_input <- tibble(target = rep(1,9),
               num_var = rep(2,9),
               char = c(rep("a", 6),rep("b",3)),
               PERSON_ID = as.character(c(rep("W",3),rep("D",6))),
               logi = rep(c(TRUE,FALSE,FALSE),3),
               fac = as.factor(c(rep("1",6),rep("2",3)))
               )
             
recipe_spec <- recipe(target ~ ., data = data_input) %>% 
  update_role("PERSON_ID", new_role = "id variable") %>%
  step_dummy(all_nominal_predictors(),-all_numeric_predictors(),-all_outcomes(),-has_role(match = "id variable")) %>% 
  step_factor2string(PERSON_ID)

recipe_spec %>%  prep() %>%  juice()  %>%  glimpse()

recipe_spec %>%  prep(strings_as_factors = FALSE) %>%  juice()  %>%  glimpse()

 
        
Leonhard Geisler
  • 506
  • 3
  • 15