8

As explained in this article, it matters for calculating the F-1 score (that is, for calculating recall and precision) whether those calculations are based on the positive or negative class. For example, if I have a skewed dataset with 1% labels of category A and 99% labels of category B and I am just assigning A the positive category and classify all test items as positive, my F-1 score will be very good. How do I tell scikit-learn which category is the positive category in a binary classification? (If helpful, I can provide code.)

You_got_it
  • 331
  • 3
  • 12
  • Related: https://stackoverflow.com/questions/50933561/how-to-specify-positive-label-when-use-precision-as-scoring-in-gridsearchcv – Maggie Nov 27 '18 at 22:47
  • "For example, if I have a skewed dataset with 1% labels of category A and 99% labels of category B and I am just assigning A the positive category and classify all test items as positive, my F-1 score will be very good." How would your F1-score be good? Wouldn't your precision be 0.01 and your recall 1, meaning an F1-score of about 0.0198? – Tejas_hooray Jul 30 '23 at 10:23

1 Answers1

8

For binary classification, sklearn.metrics.f1_score will by default make the assumption that 1 is the positive class, and 0 is the negative class. If you use those conventions (0 for category B, and 1 for category A), it should give you the desired behavior. It is possible to override this behavior by passing the pos_label keyword argument to the f1_score function.

See: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html

David Maust
  • 8,080
  • 3
  • 32
  • 36
  • Thanks also from for this answer. However, is there any reference from the official documentation therefore? – Johann Hagerer Apr 08 '17 at 21:45
  • Reading the manual page http://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html, `pos_label` defaults to 1 as the positive class, but it can be overridden. – David Maust Apr 13 '17 at 21:48