4

I'm new to tidymodels syntax and would like to implement leave one out cross validation using loo_cv from rsample in a tidymodel framework. However, the implementation seems different from vfold_cv and I can't find any helpful examples that implement loo_cv. Yes, I've checked the help page for examples

I would like to emulate a similar type of workflow as illustrated below from the fit_resamples() help page, but I cannot find a similar example for loo_cv. Modifying the below code with loo_cv notifies me that fit_resamples does not support loo_cv but I do not know what does support it. I assume the right solution will involve fit_split() but I cannot get that to work either. I have been Googling and generating error messages for hours though I imagine the solution will be quite simple. Thank you in advance for any direction!

folds <- vfold_cv(mtcars, v = 5)
#folds <- loo_cv(mtcars) # generates error message with fit_resamples()
spline_rec <- recipe(mpg ~ ., data = mtcars) %>%
step_ns(disp) %>%
step_ns(wt)

lin_mod <- linear_reg() %>%
set_engine("lm")

control <- control_resamples(save_pred = TRUE)

spline_res <- fit_resamples(lin_mod, spline_rec, folds, control = control)

spline_res %>% 
collect_predictions
Jordan
  • 614
  • 1
  • 7
  • 20

2 Answers2

3

We don't really support LOO in tidymodels. It's a fairly deprecated method and you'd be better off using the bootstrap or many repeats of 10-fold CV.

We may work with it in the future but, from a support point-of-view, the overhead of that method is fairly high (since it behaves differently than all other methods). We'd rather spend time on other missing features for now.

topepo
  • 13,534
  • 3
  • 39
  • 52
0

The following code works but I don't think it is really capturing the efficiency or utility of the tidymodels approach. Would still love a better suggestion.

loocvdat <- loo_cv(mtcars)

lm_spec <- linear_reg() %>% 
set_engine("lm")

splitfun <- function(mysplit){
  fit_split(mpg~.,
        model=lm_spec,
        split=mysplit) %>% 
  collect_predictions}

map(loocvdat$splits,splitfun)
Jordan
  • 614
  • 1
  • 7
  • 20
  • I may be misunderstanding how the tidymodels flow is supposed to work, but I think even this won't do what you want because what I see generated from `loo_cv` is the entire training set duplicated by the number of rows, and nothing that identifies a train-test split within each fold... – Tom Wagstaff Aug 25 '23 at 10:02