8

Consider the following gridsearch :
grid = GridSearchCV(clf, parameters, n_jobs =-1, iid=True, cv =5)
grid_fit = grid.fit(X_train1, y_train1)

According to Sklearn's ressource, grid_fit.best_score_ returns The mean cross-validated score of the best_estimator .

To me that would mean that the average of :

cross_val_score(grid_fit.best_estimator_, X_train1, y_train1, cv=5)

should be exactly the same as:

grid_fit.best_score_.

However I am getting a 10% difference between the two numbers. What am I missing ?

I am using the gridsearch on proprietary data so I am hoping somebody has run into something similar in the past and can guide me without a fully reproducible example. I will try to reproduce this with the Iris dataset if it's not clear enough...

Eric F
  • 327
  • 2
  • 11
  • It would be better if you could simply share the code that you have written excluding the data related part – Gambit1614 Jun 15 '18 at 17:05

1 Answers1

5

when an integer number is passed to GridSearchCV(..., cv=int_number) parameter, then the StratifiedKFold will be used for cross-validation splitting. So the data set will be randomly splitted by StratifiedKFold. This might affect the accuracy and therefore the best score.

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • 3
    I think that was it. By manually setting the kfold strategy with `skf = StratifiedKFold(n_splits=5, random_state=7425, shuffle=True)` and simply inputing it to both function `GridSearchCV(..., cv=skf)` the weird behaviour disappear. thank you ! – Eric F Jun 15 '18 at 18:46
  • your response Erik F is actually the answer... This is odd behavior as the documentation indicates that what I'm specifically using for the parameters should turn into the same object constructed. Yet, grid search and cross-val-score were significantly different. I'm assuming the randomization is the issue but given that repeated runs all are consistent, I actually can't fathom that is it. – Shawn Cicoria Apr 23 '20 at 17:09
  • Doesn't `cross_val_score` also exhibit the same behavior? Should you specify `cv`, `StratifiedKFold` will be used as well. How does this explain the difference? – MrSoLoDoLo Jun 15 '20 at 03:25