3

I'm new to xgboost and I want to know the detail of its prediction process.

Here is my test case.

I find that model.apply function can get the prediction of every single tree in the xgb model. And the model.get_booster().get_dump() function can find the detailed decision rule as well as the score of each leaf node of every single tree.

So I write a parser to parse all the leaf node ids and its associated prediction score. Given an input data, I first use model.apply(x) to get the prediction of every tree, and sum them up, get score1. Then I use model.predict(x) to get score2. As far as I can understand, shouldn't score1 and score2 be equal?

Why do I get so different results?

The following is my test code.

import xgboost as xgb
model = xgb.XGBRegressor(max_depth=5, learning_rate=0.1, n_estimators=500, silent=False, objective='reg:gamma')
model.load_model('my_checkpoint_file')
x = pandas_data.values
x0 = x[0:1,:]
dict_all_leaf_score = ParseLeafScore(model.get_booster().get_dump())
sum_leaf_score = 0.0
for idx, row in enumerate(model.apply(x0)[0]):
    leaf_score = dict_all_leaf_score[idx][row]
    sum_leaf_score += leaf_score
score = model.predict(x0)
print(score) # 0.32
print(sum_leaf_score) # -0.11

Thank you all for helping me!!

pfc
  • 1,831
  • 4
  • 27
  • 50

1 Answers1

1

model = xgb.XGBRegressor(max_depth=5, learning_rate=0.1, n_estimators=500, silent=False, objective='reg:gamma')

Your XGBoost regression model is using a non-linear objective function (reg:gamma), hence you must apply the exp() function to your sum_leaf_score value. Also, don't forget to add the base score (aka intercept).

score = exp(base_score + sum_leaf_score)
user1808924
  • 4,563
  • 2
  • 17
  • 20
  • Also, some XGBoost booster algorithms (DART) use weighted sum instead of sum. – user1808924 Sep 03 '21 at 05:23
  • Do you mean that in this case, score1 = m.predict(x0) should be equal to exp(sum_leaf_score) ? I tried but it is not equal. And what is the base_score? I find the model has an attribute named intercept, but I called it and the program told me it is not defined in this case. Thx. – pfc Sep 03 '21 at 07:36
  • `score` should be equal to the `exp(sum_leaf_scores + base_score)`. If you don't know what `base_score` stands for, then you should ask google "XGBoost base_score". – user1808924 Sep 03 '21 at 18:41