Why is F1 measure effective for evaluating multiclass classifiers?

Question

I was looking for a good error metric for multiclass classifiers, and many people say that F1 measure is usually used. But given that predictions of multiclass classifiers are one-hot vectors, doesn't it mean there are no true positives when the prediction is wrong? What i mean is:

image

when the prediction is correct, every element is true negatives except for the single '1'. So the precision here is just 1.

image

And when the prediction is incorrect, there is no true positives. So the precision is 0.

I would understand that F1 is a powerful metric method when it comes to multilabel classifications, since there can be more than one 1's in the vector, but applying F1 on multiclass classification seems a bit weird to me. Isn't it same with just accuracy? Or does it mean that F1 score per class should be used?

MarcoM · Answer 1 · 2020-07-10T07:15:00.960

0

I'd suggest giving a look on Wikipedia, in particular the section "Extension to multi-class classification".

A good explanation about how to apply F1 to multiclass classifiers is found on Coursera.

edited Jul 10 '20 at 07:15

answered Jul 10 '20 at 07:08

MarcoM

1,093
9
25

Why is F1 measure effective for evaluating multiclass classifiers?

1 Answers1