3

I'm new to recipes and having some issues with the API. Why can't I bake or juice my recipe steps when I've removed certain features that I'm not interested in?

set.seed(999)
train_test_split <- initial_split(mtcars)

mtcars_train <- training(train_test_split)
mtcars_test <- testing(train_test_split)

mtcars_train %>%
    recipe(mpg ~ cyl + disp + hp + gear) %>% 
    step_rm(qsec, vs, carb) %>% 
    step_center(all_numeric())  %>%
    step_scale(all_numeric()) %>%
    prep(training = mtcars_train)

results in:

Error in .f(.x[[i]], ...) : object 'qsec' not found

Which is pretty annoying because that means that I'll need to remove rows manually on both the test and train sets after steps have been applied:

rec_scale <- mtcars %>%
    recipe(mpg ~ cyl + disp + hp + gear) %>% 
    step_center(all_numeric())  %>%
    step_scale(all_numeric()) %>%
    prep(training = mtcars_train)
train <- juice(rec_scale) %>%
  select(-qsec, -vs, -carb)
test <- bake(rec_scale, mtcars_test) %>%
  select(-qsec, -vs, -carb)

Am I thinking about this wrong? I could alternatively filter beforehand, but I would think that my recipe should include things like that.

Zafar
  • 1,897
  • 15
  • 33

1 Answers1

3

You should include all columns used in a recipe steps inside the recipe() call. They can't be removed if they are not in the recipe.

library(tidymodels)
#> ── Attaching packages ────────────────────────────── tidymodels 0.0.2 ──
#> ✔ broom     0.5.2       ✔ purrr     0.3.2  
#> ✔ dials     0.0.2       ✔ recipes   0.1.6  
#> ✔ dplyr     0.8.3       ✔ rsample   0.0.5  
#> ✔ ggplot2   3.2.0       ✔ tibble    2.1.3  
#> ✔ infer     0.4.0.1     ✔ yardstick 0.0.3  
#> ✔ parsnip   0.0.3
#> ── Conflicts ───────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard() masks scales::discard()
#> ✖ dplyr::filter()  masks stats::filter()
#> ✖ dplyr::lag()     masks stats::lag()
#> ✖ recipes::step()  masks stats::step()

set.seed(999)
train_test_split <- initial_split(mtcars)

mtcars_train <- training(train_test_split)
mtcars_test <- testing(train_test_split)

rec <- 
  mtcars_train %>%
  recipe(mpg ~ cyl + disp + hp + gear) %>% 
  step_center(all_numeric())  %>%
  step_scale(all_numeric()) %>%
  prep(training = mtcars_train)

summary(rec)
#> # A tibble: 5 x 4
#>   variable type    role      source  
#>   <chr>    <chr>   <chr>     <chr>   
#> 1 cyl      numeric predictor original
#> 2 disp     numeric predictor original
#> 3 hp       numeric predictor original
#> 4 gear     numeric predictor original
#> 5 mpg      numeric outcome   original

Created on 2019-08-04 by the reprex package (v0.2.1)

topepo
  • 13,534
  • 3
  • 39
  • 52
  • My issue is that `recipes` should let me codify the transformations of my data frames. That's what it's for right? The fact that it only lets me change what's in the recipe formula is pretty limiting. That aside, thanks for all the work with tidymodels Max. – Zafar Dec 18 '19 at 21:53