18

I use the xgboots sklearn interface below to create and train an xgb model-1.

clf = xgb.XGBClassifier(n_estimators = 100, objective= 'binary:logistic',)
clf.fit(x_train, y_train,  early_stopping_rounds=10, eval_metric="auc", 
    eval_set=[(x_valid, y_valid)])

And the xgboost model can be created by original xgboost as model-2 below:

param = {}
param['objective'] = 'binary:logistic'
param['eval_metric'] = "auc"
num_rounds = 100
xgtrain = xgb.DMatrix(x_train, label=y_train)
xgval = xgb.DMatrix(x_valid, label=y_valid)
watchlist = [(xgtrain, 'train'),(xgval, 'val')]
model = xgb.train(plst, xgtrain, num_rounds, watchlist, early_stopping_rounds=10)

I think all the parameters are the same between model-1 and model-2. But the validation score is different. Is any difference between model-1 and model-2 ?

petezurich
  • 9,280
  • 9
  • 43
  • 57
ybdesire
  • 1,593
  • 1
  • 20
  • 35
  • 2
    I have had the same issue. I spent a few hours going through all of the docs and and all my code, set all of the parameters the same and then trained. Still, I find that the `xgb.XGBClassifier` gives 0.51 auc and `xgb.train` gives 0.84 auc. I have no idea why. – Little Bobby Tables Jul 13 '16 at 15:27
  • 2
    `sklearn` interface does not have some of options. For example, method `set_group` of the `DMatrix` class which is crucial for ranking has no analog in `sklearn` interface. – xolodec Oct 24 '16 at 06:54

3 Answers3

9

As I understand, there are many differences between default parameters in xgb and in its sklearn interface. For example: default xgb has eta=0.3 while the other has eta=0.1. You can see more about default parameters of each implements here:

https://github.com/dmlc/xgboost/blob/master/doc/parameter.md http://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn

Du Phan
  • 231
  • 2
  • 8
  • @gofr1: The essential part is that two implements have different default parameters. I also give an example. Regards, – Du Phan Aug 22 '16 at 12:48
3

Results should be the same, as XGBClassifier is only a sklearn's interface that in the end calls to the xgb library.

You can try to add the same seed to both approaches in order to get same results. For example, in your sklearn's interface:

clf = xgb.XGBClassifier(n_estimators = 100, objective= 'binary:logistic',seed=1234)
Guiem Bosch
  • 2,728
  • 1
  • 21
  • 37
2

In my case, I gave 10 for n_esetimators of XGVRegressor in sklearn which is stands for num_boost_round of original xgboost and both showed the same result, it was linear regression though. Other parameters are set as default.

#1
param = {
    'objective': 'reg:squarederror'
}
bst = xgb.train(param, dtrain)

#2
sk_xgb = xgb.XGBRegressor(objective="reg:squarederror", n_estimators=10)

# #1 and #2 result same

My env was xgboost 1.3.0 and scikit-learn 0.24.1 on conda 4.9.2.

Try it.

Jun Park
  • 21
  • 1