0

I was reading a tutorial for tidymodels and came across the following code block:

lr_recipe <- 
  recipe(children ~ ., data = hotel_other) %>% 
  step_date(arrival_date) %>% 
  step_holiday(arrival_date, holidays = holidays) %>% 
  step_rm(arrival_date) %>% 
  step_dummy(all_nominal_predictors()) %>% 
  step_zv(all_predictors()) %>% 
  step_normalize(all_predictors())

( This is the source of the code: https://www.tidymodels.org/start/case-study/#first-model )

Basically, the code lists a set of pre-processing operations on predictors that are stored in a recipe object. Now, my question arises from the following: first, in step_dummy(all_nominal_predictors()) one-hot encoding is performed on categorical predictors. Then, in a following step, step_normalize(all_predictors()) applies centering and scaling to all predictors (therefore also on the encoded categorical ones. I am used to train models directly with one-hot encoded categorical predictors, without further processing them through a normalizing step. What is the advantage of normalizing one-hot encoded predictors? Also, how does it affect the interpretability of the model when predictions are done? Thanks for any clarification.

PiMas
  • 1
  • 1

1 Answers1

0

If the binary variables are the only predictors, the set of predictors is already standardized (to be on the same units/scale) so no need to do anything else.

topepo
  • 13,534
  • 3
  • 39
  • 52
  • thanks for your answer. The binary predictors in this case are not the only features in the dataset. Also, I would be curious about the general approach with binary variables: when (if ever) does it make sense to standardize them? – PiMas Mar 28 '23 at 08:35