Why is the ROC_AUC from cross_val_score so much higher than manually using a StratfiedKFold with metrics.roc_auc_score for an XGB classifier?

Question

Method 1 - StratifiedKFold cross validation

skf = StratifiedKFold(n_splits=5, shuffle=False) 

roc_aucs_temp = []

    for i, (train_index, test_index) in enumerate(skf.split(X_train_xgb, y_train_xgb)):   
        X_train_fold, X_test_fold = X_train_xgb.iloc[train_index], X_train_xgb.iloc[test_index]
        y_train_fold, y_test_fold = y_train_xgb[train_index], y_train_xgb[test_index]
        xgb_temp.fit(X_train_fold, y_train_fold)
        y_pred=model.predict(X_test_fold)
        roc_aucs_temp.append(metrics.roc_auc_score(y_test_fold, y_pred))

print(roc_aucs_temp)
[0.8622474747474748, 0.8497474747474747, 0.9045918367346939, 0.8670918367346939, 0.879591836734694]

Method 2 CrossValScore

# this uses the same CV object as method 1 

print(cross_val_score(xgb, X_train_xgb, y_train_xgb, cv=skf, scoring='roc_auc')) 

[0.9614899  0.94861111 0.96045918 0.97270408 0.96977041]

I might be misunderstanding the functionality of cross_val_score, but from my understanding it creates K folds of training and test data. It then trains the model on K-1 folds, and tests on 1 fold, repeatedly. It should be around the same accuracy as manually creating K Folds with StratifiedKFold. Why isn't it?

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html

Does this answer your question? [What is the difference between cross\_val\_score with scoring='roc\_auc' and roc\_auc\_score?](https://stackoverflow.com/questions/33642158/what-is-the-difference-between-cross-val-score-with-scoring-roc-auc-and-roc-au) — First Last, Jan 26 '23 at 16:28

score 0 · Answer 1 · answered Jan 26 '23 at 00:29

The documentation for roc_auc_score indicates its second argument is the label scores rather than the predicted labels. Like they show in their example, you probably want something like model.predict_proba(X_test_fold)[:, 1] instead of model.predict(X_test_fold). cross_val_score with roc_auc is evaluating it that way, and that is why you are seeing the difference.

Why is the ROC_AUC from cross_val_score so much higher than manually using a StratfiedKFold with metrics.roc_auc_score for an XGB classifier?

Method 1 - StratifiedKFold cross validation

Method 2 CrossValScore

1 Answers1