Sklearn Random Forrest different accuracy values for different label encodings

Question

I'm using sklearn Random Forrest to train my model. With the same input features for the model I tried passing the target labels first with label_binarize to create one hot encodings of my target labels and second I tried using label_encoder to encode my target labels. In both cases I'm getting different accuracy score. Is there a specific reason why this is happening, as I'm just using a different method to encode the labels without changing any input features.

score 0 · Answer 1 · answered May 17 '20 at 22:36

0

It is not because of label, but the randomness of Random Forest.

Try fix the random_state to avoid this situation.

answered May 17 '20 at 22:36

Gilseung Ahn

2,598
1
4
11

I'm using a constant random state for both runs – drew_psy May 17 '20 at 23:23

score 0 · Accepted Answer · answered May 18 '20 at 13:33

https://datascience.stackexchange.com/questions/74364/random-forrest-sklearn-gives-different-accuracy-for-different-target-label-encod

Basically when you encode your target labels as one hot encoding sklearn treats it as a multilabel problem as compared to label encoder which gives an 1d array where sklearn treats it as a multiclass problem.

https://scikit-learn.org/stable/modules/multiclass.html

Sklearn Random Forrest different accuracy values for different label encodings

2 Answers2