1

I'm trying to visualize my xgboost model using python plot_tree method. The first few trees end up with a picture showing leaf = -0.1, while some other the rest seem fine. How do I interpret this? Does this mean I use more trees than needed?

enter image description here

Ethan Le
  • 41
  • 6

1 Answers1

2

For a classification tree with 2 classes {0,1}, the value of the leaf node represent the raw score for class 1. It can be converted to a probability score by using the logistic function:

1/(1+np.exp(-1*-0.1))=0.47502081252106

What this means is if a data point ends up being distributed to this leaf, the probability of this data point being class 1 is 0.47502081252106.

Allen Qin
  • 19,507
  • 8
  • 51
  • 67
  • Thank you for your answer. However, the weird thing I found is that those trees have no splitting point and no other leaf but this one. Does this means every data point has a probability of being class 1 is 0.47502081252106 by default? – Ethan Le Dec 19 '18 at 03:04
  • 1
    It depends on the value of the leaf. In the example you gave, the leaf value is -0.1 which means the prob of this node being 1 is 0.475. If the leaf value is different, you can use the above logistic function to calculate the probability. – Allen Qin Dec 19 '18 at 05:05