Is resampling re-run in every tuning evaluation?

Question

I am experiencing difficulties trying to learn a model on the time series data. For this purpose I decided to use mlr3 framework and specifically mlr3tuning::AutoTuner function. The whole setup looks like this:

at <- mlr3tuning::AutoTuner$new(
  learner = mlr3::lrn("classif.xgboost"),
  resampling = mlr3::rsmp("RollingWindowCV", window_size = 86400, horizon = 28800, folds = 24, fixed_window = F),
  measure = mlr3::msr("classif.costs", costs = costs),
  search_space = ps,
  terminator = mlr3tuning::trm("clock_time", stop_time = as.POSIXct("2021-08-13 10:00:00")),
  tuner = mlr3tuning::tnr("random_search")
)

The error message I am getting looks like this:

Error in .__Archive__add_evals(self = self, private = private, super = super,  : 
  Assertion on 'ydt[, self$cols_y, with = FALSE]' failed: Contains missing values (column 'classif.costs', row 1).

I tried to handle the issue by myself and this is what I have tried:

At first I have attempted easy solution, if error message states that there is something wrong with msr("classif.costs", costs = costs) let's change it to msr("classif.acc"). But all it did what change measure in the error message.

Secondly, I made sure there was no NA, NaN, Inf or -Inf in my train set but the next try also have yield identical error message.

 > df <- task$data()
 > sapply(df, function(x) sum(is.na(x))) %>% sum
  [1] 0
 > sapply(df, function(x) sum(is.nan(x))) %>% sum
  [1] 0
 > sapply(df, function(x) sum(is.infinite(x))) %>% sum
  [1] 0

Finally, I came across a similar issue resolved at mlr3's github: Error on missing values without missing values The issue was found and describes as: very unbalanced dataset cause some of the cross-validation resamples not to include all of the labels. So I started checking if that also applies to my problem:
- Imbalance first - so the data is so what imbalanced but I frankly don't think it could create incomplete (label wise) cv resampling group.
```
> df[[task$target_names]] %>% table
.
    -1      0      1 
133024 413200 123584
```
- Resampling itself - if we take a look at resampling scheme it will appear clearly that the highest chances of creating problems causing group are in the one of the test groups. Each of them contains of 28800 observations but let's take a look at whole of them. Code above indicates that each one of theme has full set of labels.
  
  Disclaimer
  I am aware that those are randomly split and but after houndreds of repetition I was still unable to find the one without full set of labels.
```
> resample$instantiate(task)
> rs <- resample$instance
> sapply(1:24, function(x) df[[task$target_names]][rs$train[[x]]] %>% unique %>% length)
 [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
> sapply(1:24, function(x) df[[task$target_names]][rs$test[[x]]] %>% unique %>% length)
 [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
```

But my thought process might be faulty and resampling might be the issue I can not train the model. The only problem with that assumptionI have is that error occurs at xth evaluation. So, the problem is elswhere or resampling is rerun in each of the tuning evaluation until it creates incomplete group and yield an error, is that possible?

I did try to test this with constant hyperparameters on iris set with random labels but my results was indecisive. So I am still asking a question what am I doing wrong?

Anyway, thanks for any answers, Cheers!

score 0 · Answer 1 · answered Sep 19 '21 at 19:38

It is hard to provide an accurate answer here as the data is missing and it looks like there is some non-deterministic component involved.

So, the problem is elswhere or resampling is rerun in each of the tuning evaluation until it creates incomplete group and yield an error, is that possible?

The resampling is created once during the tuning and all tuning settings are tested on the sample splits.

I suggest setting a seed for reproducibility and investigating further in a "normal" resampling (i.e. not in a nested scenario (=tuning)).

Model trainings may fail within a tuning scenario if hyperparameter combinations are matched which don't go along well. But these failures are usually unrelated to the training data.

You could try to increase the debug output by setting the logger threshold to "debug".

In general it often helps to simplify the problem, use a subset of the dataset, reduce the tuning space, etc. Dealing with unbalanced datasets/and or outliers can be troublesome, especially for some learners (xgboost being one of them).

Is resampling re-run in every tuning evaluation?

1 Answers1