PyCaret: Classification Score vs Label Inconsistency

Question

I am working on a binary classification task using PyCaret 2.3.

The model stats look solid and I am okay to use this model for predictions (e.g. Accuracy=0.9)

What I find confusing is the predictions generated. It seems the Score and Label do not align at all.

I would expect that sorting the prediction output by Score would show Label=1 for the highest Scores. However, the Score/Label are all over the place. The highest Score values have a Label of 0. And for Label=1 I see Score values ranging from 0.95 to 0.5007. The Score generally ranges from 0.5003 to 0.997.

score 4 · Answer 1 · answered Apr 13 '21 at 08:18

The Score represents the probability for the given Label to be true.

That is Label=1 with a Score=0.7 means there is a 70% probability this is a Label=1. Vice versa, a Label=0 with a Score=0.9 means that there is a 90% probability this is a Label=0.

predict_model() has the option raw_score=True. This will give you all the probabilities per Label.

PyCaret: Classification Score vs Label Inconsistency

1 Answers1