Decision tree classifier sees float values as numeric data, however in need to be a one hot encoded data type

Question

I'am running a decision tree classifier on the data within the picture. In the picture you can see that there are type's of data like time signature and signature key that need to be one hot encoded with 1's and 0's. However, within the dataframe all 0 and 1's are of type float. Therefore, my decision tree classifier is unable to make the distinction between if a feature is present or not, but makes a classification if a feature is useful by using 0.5's as can be seen in the second picture. How to fix this?

Thank's already in advance

I already tried turning all float into int's, but didn't exactly figure out how

score 0 · Answer 1 · answered Mar 26 '23 at 06:08

0

Decision trees don't classify by checking if a feature is present or not. Binary values that are used in the tree should be separated by using some threshold (such as 0.5), this way, 1s will be on one side and 0s will be on the other.

This is the way trees operate, there is no bug.

Here is StatQuest about classification decision trees: https://www.youtube.com/watch?v=_L39rN6gz7Y

answered Mar 26 '23 at 06:08

Matan Bendak

128
6

Regarding changing float columns to int: df["column_name1"] = df["column_name1"].astype(int) – Matan Bendak Mar 26 '23 at 06:10

Decision tree classifier sees float values as numeric data, however in need to be a one hot encoded data type

1 Answers1