I was performing cross-validation using xgboost.cv but then wanted to change to cross_val_score to use it with GridSearchCV. Before moving to hyperparameters tuning I checked if results from xgboost.cv and cross_val_score are similar and found out that there are huge differences.
I use xgboost.cv as:
params = {"objective":"binary:logistic",'colsample_bytree': 1,'learning_rate': 0.3, 'max_depth': 6, 'alpha': 0}
dmatrix = xgboost.DMatrix(table_X,table_y)
xgb_cv = xgboost.cv(dtrain=dmatrix, params=params, nfold=5,
num_boost_round=100, early_stopping_rounds=10, metrics="aucpr", as_pandas=True)
and the last row for the xgb_cv is:
train-aucp-mean | train_acupr-std | test-aucp-mean | test_acupr-std |
---|---|---|---|
0.81 | 0.00 | 0.77 | 0.00 |
For cross_val score I use
xgb = xgboost.XGBClassifier(n_estimators=100, **params)
skf = StratifiedKFold(n_splits=5)
cross_val_scores = cross_val_score(xgb,table_X,table_y, scoring='average_precision', cv=skf)
And it ends up with a mean of 0,64. That is a worrisome difference. What am I doing wrong?
Secondly 0 standard deviation for results in xboost.cv looking quite strange.