4

I read some papers about state-of-the-art semantic segmentation models and in all of them, authors use for comparison F1-score metric, but they did not write whether they use the "micro" or "macro" version of it.

Does anyone know which F1-score is used to describe the segmentation results and why it is so obvious that authors do not define it in papers?

Sample papers:

https://arxiv.org/pdf/1709.00201.pdf

https://arxiv.org/pdf/1511.00561.pdf

Panicum
  • 794
  • 1
  • 9
  • 32
  • There is only one F1-score. It is defined as the harmonic mean of precision and recall. See [the Wikipedia article](https://en.wikipedia.org/wiki/F-score). You might be thinking of different ways to compute precision and recall? Please elaborate on your question. – Cris Luengo Feb 08 '21 at 15:25

1 Answers1

3

There is just one F-1 score - the harmonic mean of precision and recall.

Macro/Micro/Samples/Weighted/Binary are used in the context of multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:

binary: Only report results for the class specified by pos_label. This is applicable only if targets (y_{true,pred}) are binary.

micro: Calculate metrics globally by counting the total true positives, false negatives and false positives.

macro: Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

weighted: Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.

samples: Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score)

Segnet paper is discussing different classes accuracy separately in Table#5. So I think they have chosen None in this case.

Abhi25t
  • 3,703
  • 3
  • 19
  • 32