I'm trying to visualize my xgboost model using python plot_tree method. The first few trees end up with a picture showing leaf = -0.1, while some other the rest seem fine. How do I interpret this? Does this mean I use more trees than needed?
Asked
Active
Viewed 1,732 times
1 Answers
2
For a classification tree with 2 classes {0,1}, the value of the leaf node represent the raw score for class 1. It can be converted to a probability score by using the logistic function:
1/(1+np.exp(-1*-0.1))=0.47502081252106
What this means is if a data point ends up being distributed to this leaf, the probability of this data point being class 1 is 0.47502081252106.

Allen Qin
- 19,507
- 8
- 51
- 67
-
Thank you for your answer. However, the weird thing I found is that those trees have no splitting point and no other leaf but this one. Does this means every data point has a probability of being class 1 is 0.47502081252106 by default? – Ethan Le Dec 19 '18 at 03:04
-
1It depends on the value of the leaf. In the example you gave, the leaf value is -0.1 which means the prob of this node being 1 is 0.475. If the leaf value is different, you can use the above logistic function to calculate the probability. – Allen Qin Dec 19 '18 at 05:05