What is the meaning of "value" in a node in sklearn decisiontree plot_tree

Question

I plotted my sklearn decision tree using the plot_tree function. The nodes have the following structure:

But I don't understand what does the value = [2417, 1059] mean. In other nodes there are other values. Thanks for explaining.

snzai · Answer 1 · 2021-11-19T18:11:58.000

`DecisionTreeClassifier`:

value in a DecisionTreeClassifier is the class split in each node's samples.

Keep in mind it might also be weighted if you weighted your classes on the call to fit().

For example:

a decision tree classifier

cw={0: 0.6495288248337029, 1: 2.1719184430027805}

Taking the true node, your true class split is calculated as:

>>> [3819.229 / cw[0], 1216.274 / cw[1]]
[5880, 560]

And if it's not clear, your criterion is calculated on the weighted split:

>>> a, b = 3819.229, 1216.274
>>> ab = a + b
>>> (-(a / ab)*math.log2(a / ab)) - ((b / ab)*math.log2(b / ab))
0.7975914228753467

`DecisionTreeRegressor`:

value in a DecisionTreeRegressor is the value that the tree would predict for a new example falling in that node. If your criterion is MSE, you'll find that value is an average measure of the samples in that node.

For example:

a decision tree regressor

*(Data: Seaborn's "dots" example set.)

A depth-1 regressor tree fitted on coherence to predict firing_rate. It's not a very useful tree, but it illustrates the idea.

Taking the true node, value is calculated as:

>>> value = data[data.coherence <= 19.2].firing_rate.mean()
>>> value
40.48326118418657

squared_error for that node is:

>>> ((data[data.coherence <= 19.2].firing_rate - value)**2).mean()
134.6504380931471

score 2 · Accepted Answer · answered Jan 14 '21 at 13:25

2

They are indicating you the number of sample by class that you have in the step.

For example, your picture show that before splitting for "hops<=5" you have 2417 samples of class 0 and 1059 samples of the class 1.

Realize that if you sum this two values, you will obtain the same number (3476) as the parameter "samples".

If the tree works, you will observe how the data is splitting better in every step. For final leaf you will see that you have clear values like [300, 2]. Then you can say that all this sample are class 0.

answered Jan 14 '21 at 13:25

Alex Serra Marrugat

1,849
1
4
14

any idea what it means in a `DecisionTreeRegressor`, esp. for internal nodes? – zyxue Nov 19 '21 at 16:44
1

turns out it may depend on the criterion used. For MAE, it's [median](https://github.com/scikit-learn/scikit-learn/blob/6cdffd860c49d30d3b9fa72c5fc1174e8eeaa35e/sklearn/tree/_criterion.pyx#L1194-L1198) – zyxue Nov 19 '21 at 17:28

What is the meaning of "value" in a node in sklearn decisiontree plot_tree

2 Answers2

`DecisionTreeClassifier`:

`DecisionTreeRegressor`:

Linked

Related

What is the meaning of "value" in a node in sklearn decisiontree plot_tree

2 Answers2

DecisionTreeClassifier:

DecisionTreeRegressor:

Linked

Related

`DecisionTreeClassifier`:

`DecisionTreeRegressor`: