Difference between original xgboost (Learning API) and sklearn XGBClassifier (Scikit-Learn API)

Question

I use the xgboots sklearn interface below to create and train an xgb model-1.

clf = xgb.XGBClassifier(n_estimators = 100, objective= 'binary:logistic',)
clf.fit(x_train, y_train,  early_stopping_rounds=10, eval_metric="auc", 
    eval_set=[(x_valid, y_valid)])

And the xgboost model can be created by original xgboost as model-2 below:

param = {}
param['objective'] = 'binary:logistic'
param['eval_metric'] = "auc"
num_rounds = 100
xgtrain = xgb.DMatrix(x_train, label=y_train)
xgval = xgb.DMatrix(x_valid, label=y_valid)
watchlist = [(xgtrain, 'train'),(xgval, 'val')]
model = xgb.train(plst, xgtrain, num_rounds, watchlist, early_stopping_rounds=10)

I think all the parameters are the same between model-1 and model-2. But the validation score is different. Is any difference between model-1 and model-2 ?

I have had the same issue. I spent a few hours going through all of the docs and and all my code, set all of the parameters the same and then trained. Still, I find that the `xgb.XGBClassifier` gives 0.51 auc and `xgb.train` gives 0.84 auc. I have no idea why. — Little Bobby Tables, Jul 13 '16 at 15:27
`sklearn` interface does not have some of options. For example, method `set_group` of the `DMatrix` class which is crucial for ranking has no analog in `sklearn` interface. — xolodec, Oct 24 '16 at 06:54

Du Phan · Answer 1 · 2016-08-22T12:49:12.030

9

As I understand, there are many differences between default parameters in xgb and in its sklearn interface. For example: default xgb has eta=0.3 while the other has eta=0.1. You can see more about default parameters of each implements here:

https://github.com/dmlc/xgboost/blob/master/doc/parameter.md http://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn

edited Aug 22 '16 at 12:49

answered Aug 22 '16 at 08:08

Du Phan

231
2
8

@gofr1: The essential part is that two implements have different default parameters. I also give an example. Regards, – Du Phan Aug 22 '16 at 12:48

score 3 · Answer 2 · answered Jun 21 '16 at 11:49

3

Results should be the same, as XGBClassifier is only a sklearn's interface that in the end calls to the xgb library.

You can try to add the same seed to both approaches in order to get same results. For example, in your sklearn's interface:

clf = xgb.XGBClassifier(n_estimators = 100, objective= 'binary:logistic',seed=1234)

answered Jun 21 '16 at 11:49

Guiem Bosch

2,728
1
21
37

I have done this but still no luck. See comment on question! – Little Bobby Tables Jul 13 '16 at 15:28

score 2 · Answer 3 · answered Feb 02 '21 at 06:31

In my case, I gave 10 for n_esetimators of XGVRegressor in sklearn which is stands for num_boost_round of original xgboost and both showed the same result, it was linear regression though. Other parameters are set as default.

#1
param = {
    'objective': 'reg:squarederror'
}
bst = xgb.train(param, dtrain)

#2
sk_xgb = xgb.XGBRegressor(objective="reg:squarederror", n_estimators=10)

# #1 and #2 result same

My env was xgboost 1.3.0 and scikit-learn 0.24.1 on conda 4.9.2.

Try it.

Difference between original xgboost (Learning API) and sklearn XGBClassifier (Scikit-Learn API)

3 Answers3