I'm evaluating a XGBoost classifier. I split the dataset into train and validation sets, perform a cross-validation with the model default implementation using the train set and compute the ROC AUC:
xgbClassCV = XGBClassifier()
kfold = StratifiedKFold(n_splits = 5)
auc = cross_val_score(xgbClassCV, x_train, y_train, scoring = "roc_auc", cv = kfold)
auc_avg = auc.mean()
The ROC AUC (auc_avg
) is roughly 0.76.
I then perform a hyperparameter tuning through a randomized cross-validation using the train set:
xgbGrid = {..., ..., ..., ...}
xgbClassHT = XGBClassifier()
kfold = StratifiedKFold(n_splits = 5)
xgbClassRand = RandomizedSearchCV(estimator = xgbClassHT, param_distributions = xgbGrid, n_iter = 60, \
cv = kfold, n_jobs = -1, verbose = 2)
xgbClassRand.fit(x_train, y_train)
I retrieve the best parameters, train a XGBoost classifier with such parameters, make predictions with the validation set and compute the ROC AUC:
xgbClassFT = XGBClassifier(..., ..., ..., ...)
xgbClassFT.fit(x_train, y_train)
predictions = xgbClassFT.predict(x_val)
auc = metrics.roc_auc_score(y_val, predictions)
This ROC AUC (auc
) is roughly 0.65, 11 points lower than the one above. I find this puzzling since this doesn't happen with other scores I compute such as accuracy, precision, recall and F1: they remain fairly the same.
I repeat this process for a logistic regression and the same thing happens: an approx. 11-point difference between the ROC AUCs while the other scores remain fairly the same.
Any help to understand what could be going on here would be really appreciated. Thanks!