GridSearchCV gives different result than my manually tuning procedure

Question

I get different results when i am doing GridSearch using sklearn and manually.

The first code block is my procedure when i run GridSearch using sklearn:

from sklearn import ensemble
from sklearn import metrics
from sklearn import model_selection
from sklearn import pipeline
from imblearn.pipeline import Pipeline
from imblearn.under_sampling import RandomUnderSampler


X = folded_train.drop(columns = ["10_fold", "class_encoded"])
y = folded_train["class_encoded"]
ten_fold = folded_train["10_fold"]


logo = LeaveOneGroupOut()
cross_val_groups = logo.split(X, y, ten_fold)


classifier = (Pipeline([("sampling", RandomUnderSampler()),
                        ("classifier", ensemble.RandomForestClassifier(n_jobs = -1))]))

param_grid = {
    "classifier__n_estimators" : [100, 200, 300, 400, 600],
    "classifier__max_depth": [1, 3, 5, 7],
    "classifier__criterion": ["gini", "entropy"]
}

model = model_selection.GridSearchCV(
    estimator = classifier,
    param_grid = param_grid,
    scoring = "roc_auc",
    verbose = 10,
    n_jobs = 1,
    cv = cross_val_groups
)

model.fit(X,y)

And I am trying to do the same procedure manually. Here is my code:

from sklearn import ensemble
from sklearn import metrics
from sklearn import model_selection
from sklearn import pipeline
from imblearn.pipeline import Pipeline
from imblearn.under_sampling import RandomUnderSampler


X = folded_train.drop(columns = ["10_fold", "class_encoded"])
y = folded_train["class_encoded"]
ten_fold = folded_train["10_fold"]



number_of_estimators = [100, 200, 300]
maximum_depths =  [1, 3, 5, 7]
criterions = ["gini", "entropy"]

logo = LeaveOneGroupOut()


for criterion in criterions:
    for max_depth in maximum_depths:
        for n_of_estimator in number_of_estimators:
            
            for train_index, val_index in logo.split(X, y, ten_fold):
                aPipeline = (Pipeline(steps=[('sampling', RandomUnderSampler()),  
                                             ('classifier', ensemble.RandomForestClassifier(criterion= criterion,
                                                                                            max_depth= max_depth, 
                                                                                            n_estimators= n_of_estimator,
                                                                                            n_jobs=-1))]))
                X_trn, X_vl = X.iloc[train_index], X.iloc[val_index]
                y_trn, y_vl = y.iloc[train_index], y.iloc[val_index]

                aPipeline1.fit(X_trn, y_trn)

                predictions = aPipeline1.predict(X_vl)

                print("Criterion", criterion, " Max Depth", max_depth, "Number of estimator ", n_of_estimator, "score ", metrics.roc_auc_score(y_vl, predictions))

For sklearn GridSearchCV, i obtained scores (roc_auc) for specific parameters as below:

For criterion = "gini", max_depth = 1 and n_estimators = 100, [0.786, 0.799, 0.789, 0.796, 0.775, 0.776, 0.779, 0.788, 0.770, 0.769] for each cv iteration

And my manual execution for same parameters i get: [0.730, 0.749, 0.714, 0.710, 0.732, 0.724, 0.711, 0.724, 0.715, 0.734]

And this results is valid for other parameter combinations too. What are the factors that lead to such kind of situation?

Note: I found this but it is not answer of my problem: Why GridSearchCV model results are different than the model I manually tuned?

Comparing experiments that way always carries the implicit (but strong) assumption of *all other being equal*. This is not the case here; randomness enters from too may points (`RandomUnderSampler` and the RF classifier), and you have not even fixed the random seeds. Even with the random seed fixed, the conditions may not be identical - see [Why does the importance parameter influence performance of Random Forest in R?](https://stackoverflow.com/questions/63224935/why-does-the-importance-parameter-influence-performance-of-random-forest-in-r) for a (subtle) example. — desertnaut, Mar 02 '21 at 13:47

GridSearchCV gives different result than my manually tuning procedure

0 Answers0