XGBoost - huge difference between xgb.cv and cross_val_score

Question

I was performing cross-validation using xgboost.cv but then wanted to change to cross_val_score to use it with GridSearchCV. Before moving to hyperparameters tuning I checked if results from xgboost.cv and cross_val_score are similar and found out that there are huge differences.

I use xgboost.cv as:

params = {"objective":"binary:logistic",'colsample_bytree': 1,'learning_rate': 0.3, 'max_depth': 6, 'alpha': 0}

dmatrix = xgboost.DMatrix(table_X,table_y)

xgb_cv = xgboost.cv(dtrain=dmatrix, params=params, nfold=5,
                    num_boost_round=100, early_stopping_rounds=10, metrics="aucpr", as_pandas=True)

and the last row for the xgb_cv is:

train-aucp-mean	train_acupr-std	test-aucp-mean	test_acupr-std
0.81	0.00	0.77	0.00

For cross_val score I use

xgb = xgboost.XGBClassifier(n_estimators=100, **params)

skf = StratifiedKFold(n_splits=5)
cross_val_scores = cross_val_score(xgb,table_X,table_y, scoring='average_precision', cv=skf)

And it ends up with a mean of 0,64. That is a worrisome difference. What am I doing wrong?

Secondly 0 standard deviation for results in xboost.cv looking quite strange.

Can you please provide a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) of your code? — Kim Tang, Aug 19 '22 at 09:01
@KimTang what is missing here? I can't share the data and besides that I think there is no more code I can add to that issue. — overb, Aug 19 '22 at 09:52
the data or an example snippet containing some part of the data to reproduce the issue is missing. It's easier for others to understand what the problem is if they can reproduce and debug it. StupidWolfs answer below contains a minimal reproducible example with an available dataset. — Kim Tang, Aug 19 '22 at 14:00

StupidWolf · Answer 1 · 2022-08-19T14:51:28.523

In the xgboost.cv function, "aucpr" is used, thanks to @BenReiniger for pointing this out, in the documentation this will be area under the precision recall curve using the linear trapezoidal method whereas average_precision from sklearn uses another method.

So if we stick to the method used by sklearn (equivalent in xgboost is "map'), it will give a very similar score.

Example dataset:

from sklearn import datasets
import xgboost
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.metrics import mean_absolute_error

iris = datasets.load_iris()
X = iris.data
y = (iris.target == 1).astype(int)
dmatrix = xgboost.DMatrix(X,y)

Params and we can use the same kfold for both functions:

params = {"objective":"binary:logistic",'colsample_bytree': 1,'learning_rate': 0.3, 'max_depth': 6, 'alpha': 0}

skf = StratifiedKFold(n_splits=5)

You cannot set the early stopping round, because in the sklearn cross_val_score function, this is not possible, so we have to boost it the same number :

xgb_cv = xgboost.cv(dtrain=dmatrix, params=params, folds = skf, metrics = "map", as_pandas=True, num_boost_round = 100)

xgb = xgboost.XGBClassifier(n_estimators=100, **params)

cross_val_scores = cross_val_score(xgb,X,y, scoring='average_precision',cv=skf)

print(cross_val_scores)
[1.         1.         0.8915404  0.91916667 1.        ]

Above give a mean of 0.9621414141414141

And the cv score , with the test-map-mean column similar to the above :

    train-map-mean  train-map-std  test-map-mean  test-map-std
95        0.999878       0.000244       0.962562      0.046144
96        0.999878       0.000244       0.962562      0.046144
97        0.999878       0.000244       0.962562      0.046144
98        0.999878       0.000244       0.962562      0.046144
99        0.999878       0.000244       0.962562      0.046144

To use the trapezoidal method (i.e interpolation), the equivalent in sklearn and xgboost :

xgb_cv = xgboost.cv(dtrain=dmatrix, params=params, folds = skf, metrics = "aupr", as_pandas=True, num_boost_round = 100)

cross_val_scores = cross_val_score(xgb,X,y, scoring='roc_auc',cv=skf)

@BenReiniger you are right, thanks for pointing it out. the equivalent of aupr will be roc_auc in sklearn . It's the trapezoid or not trapezoid method — StupidWolf, Aug 19 '22 at 14:52
roc_auc is the ROC curve not the PR curve. I think doing something like `auc(pr_curve(...))` is the analogue of xgb's `aupr`. — Ben Reiniger, Aug 19 '22 at 17:47

XGBoost - huge difference between xgb.cv and cross_val_score

1 Answers1