I'm new to xgboost and I want to know the detail of its prediction process.
Here is my test case.
I find that model.apply function can get the prediction of every single tree in the xgb model. And the model.get_booster().get_dump() function can find the detailed decision rule as well as the score of each leaf node of every single tree.
So I write a parser to parse all the leaf node ids and its associated prediction score. Given an input data, I first use model.apply(x)
to get the prediction of every tree, and sum them up, get score1
. Then I use model.predict(x)
to get score2
. As far as I can understand, shouldn't score1
and score2
be equal?
Why do I get so different results?
The following is my test code.
import xgboost as xgb
model = xgb.XGBRegressor(max_depth=5, learning_rate=0.1, n_estimators=500, silent=False, objective='reg:gamma')
model.load_model('my_checkpoint_file')
x = pandas_data.values
x0 = x[0:1,:]
dict_all_leaf_score = ParseLeafScore(model.get_booster().get_dump())
sum_leaf_score = 0.0
for idx, row in enumerate(model.apply(x0)[0]):
leaf_score = dict_all_leaf_score[idx][row]
sum_leaf_score += leaf_score
score = model.predict(x0)
print(score) # 0.32
print(sum_leaf_score) # -0.11
Thank you all for helping me!!