To control for a minority positive class for the project I'm working on, I'm implementing step_downsample()
in my recipe. I'm also using 10-fold cross-validation to mitigate bias. When I use a workflow to wrap up the learner, recipe, a grid search, and the CV folds, does the workflow apply the recipe steps to each individual fold prior to model training? The order of operations is hazy to me and I wasn't able to find any satisfactory answers in the documentation. Thanks!
Asked
Active
Viewed 199 times
0

Jeffrey Brabec
- 481
- 6
- 11
1 Answers
1
I think you might find this chapter helpful, especially the section "Where does the model begin and end?".
Yes, in tidymodels, the preprocessing recipe (i.e. feature engineering procedure) is considered part of the modeling process and is trained on each fold like the learner.
You can see this happening in the logging if you set verbose = TRUE
during tuning:
library(tidymodels)
library(themis)
#>
#> Attaching package: 'themis'
#> The following objects are masked from 'package:recipes':
#>
#> step_downsample, step_upsample
data(Ionosphere, package = "mlbench")
svm_mod <-
svm_rbf(cost = tune(), rbf_sigma = tune()) %>%
set_mode("classification") %>%
set_engine("kernlab")
iono_rec <-
recipe(Class ~ ., data = Ionosphere) %>%
# remove any zero variance predictors
step_zv(all_predictors()) %>%
# remove any linear combinations
step_lincomb(all_numeric()) %>%
step_downsample(Class)
set.seed(123)
iono_rs <- bootstraps(Ionosphere, times = 5)
set.seed(325)
svm_mod %>%
tune_grid(
iono_rec,
resamples = iono_rs,
control = control_grid(verbose = TRUE)
)
#> i Bootstrap1: preprocessor 1/1
#> ✓ Bootstrap1: preprocessor 1/1
#> i Bootstrap1: preprocessor 1/1, model 1/10
#> ✓ Bootstrap1: preprocessor 1/1, model 1/10
#> i Bootstrap1: preprocessor 1/1, model 1/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 2/10
#> ✓ Bootstrap1: preprocessor 1/1, model 2/10
#> i Bootstrap1: preprocessor 1/1, model 2/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 3/10
#> ✓ Bootstrap1: preprocessor 1/1, model 3/10
#> i Bootstrap1: preprocessor 1/1, model 3/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 4/10
#> ✓ Bootstrap1: preprocessor 1/1, model 4/10
#> i Bootstrap1: preprocessor 1/1, model 4/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 5/10
#> ✓ Bootstrap1: preprocessor 1/1, model 5/10
#> i Bootstrap1: preprocessor 1/1, model 5/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 6/10
#> ✓ Bootstrap1: preprocessor 1/1, model 6/10
#> i Bootstrap1: preprocessor 1/1, model 6/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 7/10
#> ✓ Bootstrap1: preprocessor 1/1, model 7/10
#> i Bootstrap1: preprocessor 1/1, model 7/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 8/10
#> ✓ Bootstrap1: preprocessor 1/1, model 8/10
#> i Bootstrap1: preprocessor 1/1, model 8/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 9/10
#> ✓ Bootstrap1: preprocessor 1/1, model 9/10
#> i Bootstrap1: preprocessor 1/1, model 9/10 (predictions)
#> i Bootstrap1: preprocessor 1/1, model 10/10
#> ✓ Bootstrap1: preprocessor 1/1, model 10/10
#> i Bootstrap1: preprocessor 1/1, model 10/10 (predictions)
#> i Bootstrap2: preprocessor 1/1
#> ✓ Bootstrap2: preprocessor 1/1
#> i Bootstrap2: preprocessor 1/1, model 1/10
#> ✓ Bootstrap2: preprocessor 1/1, model 1/10
#> i Bootstrap2: preprocessor 1/1, model 1/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 2/10
#> ✓ Bootstrap2: preprocessor 1/1, model 2/10
#> i Bootstrap2: preprocessor 1/1, model 2/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 3/10
#> ✓ Bootstrap2: preprocessor 1/1, model 3/10
#> i Bootstrap2: preprocessor 1/1, model 3/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 4/10
#> ✓ Bootstrap2: preprocessor 1/1, model 4/10
#> i Bootstrap2: preprocessor 1/1, model 4/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 5/10
#> ✓ Bootstrap2: preprocessor 1/1, model 5/10
#> i Bootstrap2: preprocessor 1/1, model 5/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 6/10
#> ✓ Bootstrap2: preprocessor 1/1, model 6/10
#> i Bootstrap2: preprocessor 1/1, model 6/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 7/10
#> ✓ Bootstrap2: preprocessor 1/1, model 7/10
#> i Bootstrap2: preprocessor 1/1, model 7/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 8/10
#> ✓ Bootstrap2: preprocessor 1/1, model 8/10
#> i Bootstrap2: preprocessor 1/1, model 8/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 9/10
#> ✓ Bootstrap2: preprocessor 1/1, model 9/10
#> i Bootstrap2: preprocessor 1/1, model 9/10 (predictions)
#> i Bootstrap2: preprocessor 1/1, model 10/10
#> ✓ Bootstrap2: preprocessor 1/1, model 10/10
#> i Bootstrap2: preprocessor 1/1, model 10/10 (predictions)
#> i Bootstrap3: preprocessor 1/1
#> ✓ Bootstrap3: preprocessor 1/1
#> i Bootstrap3: preprocessor 1/1, model 1/10
#> ✓ Bootstrap3: preprocessor 1/1, model 1/10
#> i Bootstrap3: preprocessor 1/1, model 1/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 2/10
#> ✓ Bootstrap3: preprocessor 1/1, model 2/10
#> i Bootstrap3: preprocessor 1/1, model 2/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 3/10
#> ✓ Bootstrap3: preprocessor 1/1, model 3/10
#> i Bootstrap3: preprocessor 1/1, model 3/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 4/10
#> ✓ Bootstrap3: preprocessor 1/1, model 4/10
#> i Bootstrap3: preprocessor 1/1, model 4/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 5/10
#> ✓ Bootstrap3: preprocessor 1/1, model 5/10
#> i Bootstrap3: preprocessor 1/1, model 5/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 6/10
#> ✓ Bootstrap3: preprocessor 1/1, model 6/10
#> i Bootstrap3: preprocessor 1/1, model 6/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 7/10
#> ✓ Bootstrap3: preprocessor 1/1, model 7/10
#> i Bootstrap3: preprocessor 1/1, model 7/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 8/10
#> ✓ Bootstrap3: preprocessor 1/1, model 8/10
#> i Bootstrap3: preprocessor 1/1, model 8/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 9/10
#> ✓ Bootstrap3: preprocessor 1/1, model 9/10
#> i Bootstrap3: preprocessor 1/1, model 9/10 (predictions)
#> i Bootstrap3: preprocessor 1/1, model 10/10
#> ✓ Bootstrap3: preprocessor 1/1, model 10/10
#> i Bootstrap3: preprocessor 1/1, model 10/10 (predictions)
#> i Bootstrap4: preprocessor 1/1
#> ✓ Bootstrap4: preprocessor 1/1
#> i Bootstrap4: preprocessor 1/1, model 1/10
#> ✓ Bootstrap4: preprocessor 1/1, model 1/10
#> i Bootstrap4: preprocessor 1/1, model 1/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 2/10
#> ✓ Bootstrap4: preprocessor 1/1, model 2/10
#> i Bootstrap4: preprocessor 1/1, model 2/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 3/10
#> ✓ Bootstrap4: preprocessor 1/1, model 3/10
#> i Bootstrap4: preprocessor 1/1, model 3/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 4/10
#> ✓ Bootstrap4: preprocessor 1/1, model 4/10
#> i Bootstrap4: preprocessor 1/1, model 4/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 5/10
#> ✓ Bootstrap4: preprocessor 1/1, model 5/10
#> i Bootstrap4: preprocessor 1/1, model 5/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 6/10
#> ✓ Bootstrap4: preprocessor 1/1, model 6/10
#> i Bootstrap4: preprocessor 1/1, model 6/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 7/10
#> ✓ Bootstrap4: preprocessor 1/1, model 7/10
#> i Bootstrap4: preprocessor 1/1, model 7/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 8/10
#> ✓ Bootstrap4: preprocessor 1/1, model 8/10
#> i Bootstrap4: preprocessor 1/1, model 8/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 9/10
#> ✓ Bootstrap4: preprocessor 1/1, model 9/10
#> i Bootstrap4: preprocessor 1/1, model 9/10 (predictions)
#> i Bootstrap4: preprocessor 1/1, model 10/10
#> ✓ Bootstrap4: preprocessor 1/1, model 10/10
#> i Bootstrap4: preprocessor 1/1, model 10/10 (predictions)
#> i Bootstrap5: preprocessor 1/1
#> ✓ Bootstrap5: preprocessor 1/1
#> i Bootstrap5: preprocessor 1/1, model 1/10
#> ✓ Bootstrap5: preprocessor 1/1, model 1/10
#> i Bootstrap5: preprocessor 1/1, model 1/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 2/10
#> ✓ Bootstrap5: preprocessor 1/1, model 2/10
#> i Bootstrap5: preprocessor 1/1, model 2/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 3/10
#> ✓ Bootstrap5: preprocessor 1/1, model 3/10
#> i Bootstrap5: preprocessor 1/1, model 3/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 4/10
#> ✓ Bootstrap5: preprocessor 1/1, model 4/10
#> i Bootstrap5: preprocessor 1/1, model 4/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 5/10
#> ✓ Bootstrap5: preprocessor 1/1, model 5/10
#> i Bootstrap5: preprocessor 1/1, model 5/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 6/10
#> ✓ Bootstrap5: preprocessor 1/1, model 6/10
#> i Bootstrap5: preprocessor 1/1, model 6/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 7/10
#> ✓ Bootstrap5: preprocessor 1/1, model 7/10
#> i Bootstrap5: preprocessor 1/1, model 7/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 8/10
#> ✓ Bootstrap5: preprocessor 1/1, model 8/10
#> i Bootstrap5: preprocessor 1/1, model 8/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 9/10
#> ✓ Bootstrap5: preprocessor 1/1, model 9/10
#> i Bootstrap5: preprocessor 1/1, model 9/10 (predictions)
#> i Bootstrap5: preprocessor 1/1, model 10/10
#> ✓ Bootstrap5: preprocessor 1/1, model 10/10
#> i Bootstrap5: preprocessor 1/1, model 10/10 (predictions)
#> # Tuning results
#> # Bootstrap sampling
#> # A tibble: 5 x 4
#> splits id .metrics .notes
#> <list> <chr> <list> <list>
#> 1 <split [351/136]> Bootstrap1 <tibble [20 × 6]> <tibble [0 × 1]>
#> 2 <split [351/124]> Bootstrap2 <tibble [20 × 6]> <tibble [0 × 1]>
#> 3 <split [351/134]> Bootstrap3 <tibble [20 × 6]> <tibble [0 × 1]>
#> 4 <split [351/130]> Bootstrap4 <tibble [20 × 6]> <tibble [0 × 1]>
#> 5 <split [351/138]> Bootstrap5 <tibble [20 × 6]> <tibble [0 × 1]>
Created on 2021-03-10 by the reprex package (v1.0.0)
In this particular case, it walks through each resample and first trains the recipe, then fits the first learner/parameter option, then evaluates predictions on the heldout set for that resample and learner/parameter option, then goes to the next learner option.

Julia Silge
- 10,848
- 2
- 40
- 48