2

I am trying to find a way to calculate the value at each decision node for trees in XGBoostClassifier. I am aware that it can be done in sklearn Tree methods, such as RandomForest, DecisionTree etc. For example-

enter image description here

I found that xgboost get_dump method only shows values for leaf nodes. The goal is to find the contribution of each feature in the tree in the outcome. as in- Outcome=bias + contribution(feature1) + … + contribution(feature_n).

A similar example is here- https://blog.datadive.net/interpreting-random-forests/

3 Answers3

2

There are two ways you can refer to.

  1. xgboost's pred_contribs param:
import lightgbm as lgb
import numpy as np
import pandas as pd
import sklearn

X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
model = xgb.XGBClassifier(
                          )
model.fit(X, y)

model._Booster.predict(xgb.DMatrix(X[[0],:]),pred_contribs=True) # for first sample, each feature's contribu
# array([[ 0.01069788,  0.5564505 ,  0.        , -0.18979889, -0.12519032,
#          0.0419167 , -0.04723313, -1.0029427 ,  0.00691214,  0.0219936 ,
#         -0.24528527, -0.01379179, -0.08468378, -0.870576  ,  0.00372334,
#          0.11168513, -0.01564308,  0.01463292,  0.03310397,  0.01167833,
#         -0.70834243,  1.7004861 , -0.53104496, -1.2110311 , -0.2576103 ,
#         -0.07476614, -0.40844473, -1.656179  , -0.02658875, -0.03065292,
#          0.9892135 ]], dtype=float32)
  1. eli5's implentation https://github.com/eli5-org/eli5/blob/69637074ad07cdb0d0aa3a650302a721c9272e4b/eli5/xgboost.py#L239
Joey Gao
  • 850
  • 2
  • 7
  • 14
1

Maybe this, it didn't draw a tree but you get the gain and cover values for each nodes :

bst.dump_model('dump.raw.txt', with_stats = True)

https://xgboost.readthedocs.io/en/stable/python/python_api.html?highlight=plot%20gain#xgboost.Booster.dump_model

Newcoder
  • 11
  • 2
0

For an XGBoost Classifier model:

trees = model.get_booster().trees_to_dataframe()

trees_to_dataframe() will arrange all the information about the tree's nodes [feature, split, TrueNode, FalseNode, gain, cover] into a dataframe in such a manner.

Output of trees_to_dataframe on a typical model

Later you can individually filter out the information you require.

Hope that helps!

Dev Bhuyan
  • 541
  • 5
  • 19