Obtain the number of occurrences a decision tree path has been used when classifying data

Question

I am trying to obtain the number of occurrences a decision tree path is used to classify an instance.

For example, suppose I have the following rules (not sure if they make sense):

Rule 1: [x<3 and y<5 => 'Low']
Rule 2: [x<3 and x>1 and y<5 => 'Low']
Rule 3: [x<3 and y>2 and y<5 => 'Low']
Rule 4: [x<6 and y<8 => 'Medium']
Rule 5: [x<10 and y<10 => 'High']

Now, suppose I have 10 test set samples. I want something like this given this test set and the above rules:

Rule 1 has been used 2 times,
Rule 2 has been used 2 times,
Rule 3 has been used 1 times,
Rule 4 has been used 3 times,
and Rule 5 has been used 2 times

How to tackle this using Python?

Maybe that's silly but if you predict the class of each sample in your test set, won't you implicitly get the rule that's been used for each sample based on their predicted class (in your example, rule 1 has been used 5 times if and only if exactly 5 samples of your test set has been predicted to belong to class `'Low'`) — Pauuuuuuul, Jul 16 '22 at 21:41
Thanks for your reply. I have edited my question to further clarify it. Using your logic, I won't be able to distinguish between the number of occurrences of each rule being used when classifying. For instance, suppose I have more than 1 rule classifying 'Low', and I want to count the number of occurrences of each rule being used to classify data as 'Low'. In my implementation, I want to count the number of times each rule is being used when classifying data. — user19563724, Jul 16 '22 at 21:52
Of course, if you have multiple rules predicting the same class, my proposition doesn't hold. Are you able to enumerate the rules in each of your leaves ? Have you looked in `sklearn.tree.DecisionTreeClassifier` ? [Here](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html) for the API documentation and [here](https://scikit-learn.org/stable/modules/tree.html#tree) for the user guide — Pauuuuuuul, Jul 16 '22 at 21:58

Иван Балван · Answer 1 · 2022-07-16T22:48:37.453

0

Do you want something like this:

import random

x_num=[random.randint(1,11) for _ in range(10)]
y_num=[random.randint(1,11) for _ in range(10)]

def func(xn,yn):
    rule_1=0
    rule_2=0
    for x,y in zip(xn,yn):
        if x>2 and y<3:
            rule_1+=1
        elif x<4 and y>2:
            rule_2+=1
    return rule_1,rule_2

print(func(x_num,y_num))

?

edited Jul 16 '22 at 22:48

answered Jul 16 '22 at 21:58

Иван Балван

575
3
6

1

Do you mean `for x, y in zip([1, 2, 3, 4], [4, 3, 2, 1])` ? The way you wrote it, I think there will be only one iteration in your `for` loop and you will have `x = (1, 2, 3, 4)` and `y = (4, 3, 2, 1)` – Pauuuuuuul Jul 16 '22 at 22:02
yes,thank you, i forgot... somehow – Иван Балван Jul 16 '22 at 22:04
Yes, I need something like that, however, I would want a code that is generic, if possible, so I can reuse it on my implementation. – user19563724 Jul 16 '22 at 22:33
Do you mean, for example, to wrap the code in a function? (I edited the answer). – Иван Балван Jul 16 '22 at 22:50
But in your implementation, you are assuming that you have a predefined number of rules. In my desired implementation, I would like to have a function which does the following: - Train a decision tree and extract its rules (this part I know how) - Predict unseen data and obtain the decision path followed to classify the instance (this part I don't know how and it is the part I'm asking about). - Finally, enumerate the number of times a rule path has been used to classify unseen data. NB: I have used Sklearn python package to build and train my decision trees. – user19563724 Jul 16 '22 at 23:11

Pauuuuuuul · Answer 2 · 2022-07-16T22:29:13.510

0

If you're not familiar with it, I recommend using the sklearn Python package and more precisely, the sklearn.tree.DecisionTreeClassifier class. Here are the API Documentation and the user guide.

This page should help you solve your problem as it gives more detail about the decision process and how to retrieve the path used to classify a sample.

Sorry if this answer doesn't solve your problem right away but it should get you on the way :)

edited Jul 16 '22 at 22:29

answered Jul 16 '22 at 22:09

Pauuuuuuul

253
3
12

Yes, I have used Sklearn package to build my decision tree and had a look at the API + User Guide. However, there is no function which gets the prediction tree path of a particular instance. Do you know what I can do, or perhaps letting me know if I have overlooked such a function from the given links. Thanks. – user19563724 Jul 16 '22 at 22:32
In the third link I provided, there is a "Decision path" section that explains how to get the rule / node used to classify a sample – Pauuuuuuul Jul 17 '22 at 11:21

Obtain the number of occurrences a decision tree path has been used when classifying data

2 Answers2