0

[![enter image description here][1]][1]What reason could be for the F1 score that was not a harmonic mean of precision and recall with macro-average weighted equally for multi-class? My dataset is imbalanced, and the predictions are skewed.

Java questioner
  • 157
  • 1
  • 4
  • 12
  • 1
    Not a programming question, hence arguably off-topic here; better suited for [Cross Validated](https://stats.stackexchange.com/help/on-topic). – desertnaut Feb 02 '19 at 19:07

1 Answers1

2

A macro F1 calculates metrics for each label and finds their unweighted mean. Means that it doesn't take class imbalance into account whereas, a weighted macro F1 calculates metrics for each label and finds their average weighted by the number of instances of each label. Hence, it accounts for class imbalance and can have a score not between precision and recall.

For an example of weighted F1, refer to this answer Sandeep.

Raunaq Jain
  • 917
  • 7
  • 13