Discrepancy between Optuna's AUC ROC and scikit-learn's AUC ROC for binary classification problem

Question

I'm working on a binary classification problem where I have ~30 features of enzyme substrates to predict EC1 and EC2. I'm using xgboost with optuna for hyperparameter tuning. However, I'm observing a discrepancy between the AUC ROC values reported by Optuna and the scikit-learn library.
The output from optuna:

AUC ROC score 1: 0.7109184689577985
AUC ROC score 2: 0.6030927230046949

But the AUC ROC scores using sklearn for the best parameters found using optuna are:

AUC ROC score 1: 0.7065598459411416
AUC ROC score 2: 0.5656470070422535

The code for it goes like this:

import xgboost as xgb
import optuna
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
import numpy as np

# Setting a fixed random seed for reproducibility
np.random.seed(42)

def train_model(x_train, y_train, x_eval, y_eval):
    def objective(trial):
        param = {
            'objective': 'binary:logistic',
            'eval_metric': 'auc',
            'n_estimators': trial.suggest_int('n_estimators', 100, 1000),
            'max_depth': trial.suggest_int('max_depth', 3, 6),
            'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.1, log=True),
            'subsample': trial.suggest_float('subsample', 0.5, 1),
            'colsample_bytree': trial.suggest_float('colsample_bytree', 0.5, 1),
            'reg_alpha': trial.suggest_float('reg_alpha', 0, 10),
            'reg_lambda': trial.suggest_float('reg_lambda', 0, 10),
            'gamma': trial.suggest_float('gamma', 0.01, 1, log=True),
            'random_state': 42,
            'early_stopping_rounds': 10
        }

        model = xgb.XGBClassifier(**param)

        model.fit(x_train, y_train, eval_set=[(x_eval, y_eval)], verbose=False)
        y_pred = model.predict_proba(x_eval)[:, 1]
        auc_roc = roc_auc_score(y_eval, y_pred)

        return auc_roc

    study = optuna.create_study(direction='maximize')
    study.optimize(objective, n_trials=100)

    return study.best_trial.params, study.best_trial.value

# Splitting the data into train and evaluation sets
x_train, x_eval, y_train, y_eval = train_test_split(x_train, y_train, test_size=0.2, random_state=42)

# For EC1
best_params_1, best_auc_1 = train_model(x_train, y_train[:, 0], x_eval, y_eval[:, 0])
classifier_1 = xgb.XGBClassifier(**best_params_1)
classifier_1.fit(x_train, y_train[:, 0])
y_pred_1 = classifier_1.predict_proba(x_eval)[:, 1]

# For EC2
best_params_2, best_auc_2 = train_model(x_train, y_train[:, 1], x_eval, y_eval[:, 1])
classifier_2 = xgb.XGBClassifier(**best_params_2)
classifier_2.fit(x_train, y_train[:, 1])
y_pred_2 = classifier_2.predict_proba(x_eval)[:, 1]

auc_score_1 = roc_auc_score(y_eval[:, 0], y_pred_1)
auc_score_2 = roc_auc_score(y_eval[:, 1], y_pred_2)

I have implemented the xgboost model with hyperparameter tuning using optuna. I expected the AUC ROC values obtained from Optuna's output to be consistent with the AUC ROC values calculated using scikit-learn's roc_auc_score function. However, the actual results show a noticeable difference between these values.

Discrepancy between Optuna's AUC ROC and scikit-learn's AUC ROC for binary classification problem

0 Answers0