gridsearchCV - shuffle data for every single parameter combination

Question

I am using gridsearchCV to determine model hyper-parameters:

pipe = Pipeline(steps=[(self.FE, FE_algorithm), (self.CA, Class_algorithm)])
param_grid = {**FE_grid, **CA_grid} 

scorer = make_scorer(f1_score, average='macro')
       
search = GridSearchCV(pipe, param_grid, cv=ShuffleSplit(test_size=0.20, n_splits=5,random_state=0), n_jobs=-1,
                              verbose=3, scoring=scorer)

search.fit(self.data_input, self.data_output)

However, I believe I am running into some problems with overfitting: results

I would like to shuffle the data under every single parameter combination, is there any way to do this? Currently, with the k-fold cross validation the same sets of validation data are being evaluated for each parameter combination, k-fold, and so overfitting is becoming an issue.

score 0 · Accepted Answer · answered Aug 16 '21 at 16:21

No, there isn't. The search splits the data once and creates a task for each combination of fold and parameter combination (source).

Shuffling per parameter combination is probably not desirable anyway: the selection might then just pick the "easiest" split instead of the "best" parameter. If you think you are overfitting to the validation folds, then consider using

fewer parameter options
more folds, or repeated splits*
a scoring callable that customizes evaluation
models that are more conservative

*my favorite among these, although the computation cost may be too high

gridsearchCV - shuffle data for every single parameter combination

1 Answers1