Implementing loo_cv from rsample in tidymodels

Question

I'm new to tidymodels syntax and would like to implement leave one out cross validation using loo_cv from rsample in a tidymodel framework. However, the implementation seems different from vfold_cv and I can't find any helpful examples that implement loo_cv. Yes, I've checked the help page for examples

I would like to emulate a similar type of workflow as illustrated below from the fit_resamples() help page, but I cannot find a similar example for loo_cv. Modifying the below code with loo_cv notifies me that fit_resamples does not support loo_cv but I do not know what does support it. I assume the right solution will involve fit_split() but I cannot get that to work either. I have been Googling and generating error messages for hours though I imagine the solution will be quite simple. Thank you in advance for any direction!

folds <- vfold_cv(mtcars, v = 5)
#folds <- loo_cv(mtcars) # generates error message with fit_resamples()
spline_rec <- recipe(mpg ~ ., data = mtcars) %>%
step_ns(disp) %>%
step_ns(wt)

lin_mod <- linear_reg() %>%
set_engine("lm")

control <- control_resamples(save_pred = TRUE)

spline_res <- fit_resamples(lin_mod, spline_rec, folds, control = control)

spline_res %>% 
collect_predictions

score 3 · Accepted Answer · answered Jul 20 '20 at 16:30

3

We don't really support LOO in tidymodels. It's a fairly deprecated method and you'd be better off using the bootstrap or many repeats of 10-fold CV.

We may work with it in the future but, from a support point-of-view, the overhead of that method is fairly high (since it behaves differently than all other methods). We'd rather spend time on other missing features for now.

answered Jul 20 '20 at 16:30

topepo

13,534
3
39
52

In this case, could I gently suggest that you remove the function from the package? – Tom Wagstaff Aug 25 '23 at 10:03

score 0 · Answer 2 · answered Jul 11 '20 at 18:13

0

The following code works but I don't think it is really capturing the efficiency or utility of the tidymodels approach. Would still love a better suggestion.

loocvdat <- loo_cv(mtcars)

lm_spec <- linear_reg() %>% 
set_engine("lm")

splitfun <- function(mysplit){
  fit_split(mpg~.,
        model=lm_spec,
        split=mysplit) %>% 
  collect_predictions}

map(loocvdat$splits,splitfun)

answered Jul 11 '20 at 18:13

Jordan

614
1
7
20

I may be misunderstanding how the tidymodels flow is supposed to work, but I think even this won't do what you want because what I see generated from `loo_cv` is the entire training set duplicated by the number of rows, and nothing that identifies a train-test split within each fold... – Tom Wagstaff Aug 25 '23 at 10:02

Implementing loo_cv from rsample in tidymodels

2 Answers2