1

I can export structure of a GBDT to a image with the tree.export_graphviz function:

``` Python3

from sklearn.datasets import load_iris
from sklearn import tree
from sklearn.ensemble import GradientBoostingClassifier

clf = GradientBoostingClassifier(n_estimators=1) # set to 1 for the sake of simplicity
iris = load_iris()

clf = clf.fit(iris.data, iris.target)
tree.export_graphviz(clf.estimators_[0,0], out_file='tree.dot')
check_call(['dot','-Tpng','tree.dot','-o','tree.png'])

```

This is the obtained image.

I wondering what are the value on the leafs? and How can I obtain them?

I have tried the apply and decision_function functions, neither works.

  • Value is the number of samples belonging to that or its child nodes. – Vivek Kumar Nov 24 '17 at 02:30
  • @VivekKumar Instead of `samples`, I mean `value` which can be negative. Just check the above image. – user5594832 Nov 24 '17 at 02:41
  • Ohk. What do you get when you do `export_graphviz(..., proportion = True)`. Looks like its the weight of those samples. – Vivek Kumar Nov 24 '17 at 02:50
  • Yes, the `samples` line would be the weights of samples. But what I am asking is the last line (`value=2.0`, `value=-1.0`, ...). Please read the above figure before comment again, thanks. – user5594832 Nov 24 '17 at 06:02
  • I get that. And I am talking about that only. Those two things are related. `samples and value`. And I am saying that `value` is the weight on that node for those samples. `samples` represent the number of samples. – Vivek Kumar Nov 24 '17 at 06:10
  • If someone is having difficulty using `check_call()`, you can import this function by typing `from subprocess import check_call` at the top of your python file to achieve this. – aysljc Oct 31 '22 at 01:04

1 Answers1

0

You can access leave properties of each individual tree using its internal object tree_ and its attributes. export_graphviz uses exactly this approach.

Consider this code. For each attribute, it gives an array of its values over all the tree nodes:

print(clf.estimators_[0,0].tree_.feature)
print(clf.estimators_[0,0].tree_.threshold)
print(clf.estimators_[0,0].tree_.children_left)
print(clf.estimators_[0,0].tree_.children_right)
print(clf.estimators_[0,0].tree_.n_node_samples)
print(clf.estimators_[0,0].tree_.value.ravel())

The output will be

[ 2 -2 -2]
[ 2.45000005 -2.         -2.        ]
[ 1 -1 -1]
[ 2 -1 -1]
[150  50 100]
[  3.51570624e-16   2.00000000e+00  -1.00000000e+00]

That is, your tree has 3 nodes, and the first one compares the value of feature 2 with 2.45, etc.

The values in the root node, left and right leaf are 3e-16, 2, and -1 respectively.

These values, although, are not obvious to interpret, because the tree has tried to predict the gradient of GBDT loss function.

David Dale
  • 10,958
  • 44
  • 73