1

xgb.cv and sklearn.model_selection.cross_validate do not produce the same mean train/test error even though I set the same seed/random_state and I make sure both methods use the same folds. The code at the bottom allows to reproduce my issue. (Early stopping is off by default).

I found out this issue is caused by the subsample parameter (both methods produce the same result if this parameter is set to 1) but I cannot find a way to make both methods subsample in the same way. In addition to setting seed/random_state as shown in the code at the bottom, I also tried explicitly adding:

import random
random.seed(1)
np.random.seed(1)

at the beginning of my file but this does not resolve my issue either. Any ideas?

import numpy as np
import xgboost as xgb
from xgboost import XGBClassifier
from sklearn.model_selection import cross_validate, StratifiedKFold

X = np.random.randn(100,20)
y = np.random.randint(0,2,100)
dtrain = xgb.DMatrix(X, label=y)

params = {'eta':0.3,
          'max_depth': 4,
          'gamma':0.1,
          'silent': 1,
          'objective': 'binary:logistic',
          'seed': 1,
          'subsample': 0.8
         }

cv_results = xgb.cv(params, dtrain, num_boost_round=99, seed=1,
                    folds=StratifiedKFold(5, shuffle=False, random_state=1),
                    early_stopping_rounds=10)
print(cv_results, '\n')

xgbc = XGBClassifier(learning_rate=0.3, 
                     max_depth=4, 
                     gamma=0.1, 
                     silent = 1,  
                     objective = 'binary:logistic',
                     subsample = 0.8,
                     random_state = 1,
                     n_estimators=len(cv_results))
scores = cross_validate(xgbc, X, y, 
                        cv=StratifiedKFold(5, shuffle=False, random_state=1), 
                        return_train_score=True)
print('train-error-mean = {}   test-error-mean = {}'.format(
             1-scores['train_score'].mean(), 1-scores['test_score'].mean()))

Output:

   train-error-mean  train-error-std  test-error-mean  test-error-std
0          0.214981         0.030880         0.519173        0.129533
1          0.140039         0.018552         0.549549        0.034696
2          0.105100         0.017420         0.510501        0.040517
3          0.092474         0.012587         0.450977        0.075866 

train-error-mean = 0.06994061572120636   test-error-mean = 0.4706015037593986

Output in case subsample is set to 1:

   train-error-mean  train-error-std  test-error-mean  test-error-std
0          0.180043         0.013266         0.491504        0.093246
1          0.117381         0.021328         0.488070        0.097733
2          0.074972         0.030605         0.530075        0.091446
3          0.044907         0.032232         0.519073        0.130802
4          0.032438         0.021816         0.481027        0.080622 

train-error-mean = 0.032438271604938285   test-error-mean = 0.4810275689223057
smci
  • 32,567
  • 20
  • 113
  • 146
Maauss
  • 11
  • 3

1 Answers1

0

I know for sure in the case of LGBM, but from the quick code at the XGB code (here) it seems to have a similar behaviour, so I assume the answer is relevant.

The trick is in the early stopping. The native xgb.cv defines a single iteration for which the mean CV score (or something similar to the mean, i forgot by now :) ) reaches plateau, while in sklearn cross validation models in each fold are trained independently and thus early stopping happens on different iterations for different folds.

So, if you want to get identical results- disable early stopping (which is problematic, as you can over- or under-fit and you are not aware of it). If you want to use early stopping- there is no way to get identical results due to the difference in implementations

Mischa Lisovyi
  • 3,207
  • 18
  • 29
  • 1
    Thanks for taking the time to answer my question but the `fit` method of `XGBClassifier` has early stopping turned off by default so I do not apply early stopping in the `cross_validate` method. Moreover, if early stopping caused my issue, I would expect that `xgb.cv` and `cross_validate` in the above code would give different results regardless the value of the `subsample` parameter. To be absolutely sure, I tried turning off early stopping in `xgb.cv` and set the same number of iterations in both cases but I still get different results in case subsampling is turned on. – Maauss Dec 02 '18 at 00:08
  • 1
    Your point makes sense- I did not fully understand details of the question. Looking carefully, i do not see ab obvious reason for this behaviour- your treatment of random states seems to be complete. – Mischa Lisovyi Dec 02 '18 at 12:16