What reason could be for the F1 score that was not a harmonic mean of precision and recall

Question

[![enter image description here][1]][1]What reason could be for the F1 score that was not a harmonic mean of precision and recall with macro-average weighted equally for multi-class? My dataset is imbalanced, and the predictions are skewed.

Not a programming question, hence arguably off-topic here; better suited for [Cross Validated](https://stats.stackexchange.com/help/on-topic). — desertnaut, Feb 02 '19 at 19:07

Raunaq Jain · Answer 1 · 2019-02-02T19:05:23.777

A macro F1 calculates metrics for each label and finds their unweighted mean. Means that it doesn't take class imbalance into account whereas, a weighted macro F1 calculates metrics for each label and finds their average weighted by the number of instances of each label. Hence, it accounts for class imbalance and can have a score not between precision and recall.

For an example of weighted F1, refer to this answer Sandeep.

What reason could be for the F1 score that was not a harmonic mean of precision and recall

1 Answers1