24

How to get access of individual trees of a xgboost model in python/R ?

Below I'm getting from Random Forest trees from sklearn.

estimator = RandomForestRegressor(
    oob_score=True, 
    n_estimators=10, 
    max_features='auto'
) 
estimator.fit(tarning_data,traning_target) 
tree1 = estimator.estimators_[0]
leftChild = tree1.tree_.children_left  
rightChild = tree1.tree_.children_right 
Miguel Trejo
  • 5,913
  • 5
  • 24
  • 49
ashis
  • 341
  • 1
  • 2
  • 4
  • I would like an answer to this as well since it is necessary for a confidence interval. I know that once you have trained the boosted model `bst`, simply call bst.predict(data, pred_leaf=True) The output will be a matrix of `(n_samples, n_estimators)` with each record indicating the predicted leaf index of each sample in each tree, but do not know how to recover the actual prediction of each tree. – michel Oct 21 '16 at 04:08
  • Did you guys figure it out? – None Jun 16 '17 at 18:57
  • Here it shows how to do this: https://stackoverflow.com/questions/43702514/how-to-get-each-individual-trees-prediction-in-xgboost/69135256#69135256 – Raul Jun 16 '22 at 20:42

1 Answers1

17

Do you want to inspect the trees?

In Python, you can dump the trees as a list of strings:

m = xgb.XGBClassifier(max_depth=2, n_estimators=3).fit(X, y)
m.get_booster().get_dump()

>

['0:[sincelastrun<23.2917] yes=1,no=2,missing=2\n\t1:[sincelastrun<18.0417] yes=3,no=4,missing=4\n\t\t3:leaf=-0.0965415\n\t\t4:leaf=-0.0679503\n\t2:[sincelastrun<695.025] yes=5,no=6,missing=6\n\t\t5:leaf=-0.0992546\n\t\t6:leaf=-0.0984374\n',
 '0:[sincelastrun<23.2917] yes=1,no=2,missing=2\n\t1:[sincelastrun<16.8917] yes=3,no=4,missing=4\n\t\t3:leaf=-0.0928132\n\t\t4:leaf=-0.0676056\n\t2:[sincelastrun<695.025] yes=5,no=6,missing=6\n\t\t5:leaf=-0.0945284\n\t\t6:leaf=-0.0937463\n',
 '0:[sincelastrun<23.2917] yes=1,no=2,missing=2\n\t1:[sincelastrun<18.175] yes=3,no=4,missing=4\n\t\t3:leaf=-0.0878571\n\t\t4:leaf=-0.0610089\n\t2:[sincelastrun<695.025] yes=5,no=6,missing=6\n\t\t5:leaf=-0.0904395\n\t\t6:leaf=-0.0896808\n']

Or dump them to a file (with nice formatting):

m.get_booster().dump_model("out.txt")

>

booster[0]:
0:[sincelastrun<23.2917] yes=1,no=2,missing=2
    1:[sincelastrun<18.0417] yes=3,no=4,missing=4
        3:leaf=-0.0965415
        4:leaf=-0.0679503
    2:[sincelastrun<695.025] yes=5,no=6,missing=6
        5:leaf=-0.0992546
        6:leaf=-0.0984374
booster[1]:
0:[sincelastrun<23.2917] yes=1,no=2,missing=2
    1:[sincelastrun<16.8917] yes=3,no=4,missing=4
        3:leaf=-0.0928132
        4:leaf=-0.0676056
    2:[sincelastrun<695.025] yes=5,no=6,missing=6
        5:leaf=-0.0945284
        6:leaf=-0.0937463
booster[2]:
0:[sincelastrun<23.2917] yes=1,no=2,missing=2
    1:[sincelastrun<18.175] yes=3,no=4,missing=4
        3:leaf=-0.0878571
        4:leaf=-0.0610089
    2:[sincelastrun<695.025] yes=5,no=6,missing=6
        5:leaf=-0.0904395
        6:leaf=-0.0896808
pomber
  • 23,132
  • 10
  • 81
  • 94
  • 4
    And how does one use each tree separately to make a classification and evaulate each tree? – lesolorzanov Feb 02 '21 at 19:10
  • 8
    An easier to read thing is `model.get_booster().trees_to_dataframe()` which outputs this string in a pandas DataFrame. – Markus Jul 29 '21 at 16:52
  • 1
    why would you want to use an individual tree? the trees are grown sequentially while reducing overall error of the model. they aren't ever used individually like in a random forest. – mnky9800n Oct 28 '21 at 10:12