How To Calculate F1-Score For Multilabel Classification?

Question

I try to calculate the f1_score but I get some warnings for some cases when I use the sklearn f1_score method.

I have a multilabel 5 classes problem for a prediction.

import numpy as np
from sklearn.metrics import f1_score

y_true = np.zeros((1,5))
y_true[0,0] = 1 # => label = [[1, 0, 0, 0, 0]]

y_pred = np.zeros((1,5))
y_pred[:] = 1 # => prediction = [[1, 1, 1, 1, 1]]

result_1 = f1_score(y_true=y_true, y_pred=y_pred, labels=None, average="weighted")

print(result_1) # prints 1.0

result_2 = f1_score(y_true=y_ture, y_pred=y_pred, labels=None, average="weighted")

print(result_2) # prints: (1.0, 1.0, 1.0, None) for precision/recall/fbeta_score/support

When I use average="samples" instead of "weighted" I get (0.1, 1.0, 0.1818..., None). Is the "weighted" option not useful for a multilabel problem or how do I use the f1_score method correctly?

I also get a warning when using average="weighted":

"UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples."

E.Z · Accepted Answer · 2017-10-17T13:02:30.797

12

It works if you slightly add up data:

y_true = np.array([[1,0,0,0], [1,1,0,0], [1,1,1,1]])
y_pred = np.array([[1,0,0,0], [1,1,1,0], [1,1,1,1]])

recall_score(y_true=y_true, y_pred=y_pred, average='weighted')
>>> 1.0
precision_score(y_true=y_true, y_pred=y_pred, average='weighted')
>>> 0.9285714285714286

f1_score(y_true=y_true, y_pred=y_pred, average='weighted')
>>> 0.95238095238095244

The data suggests we have not missed any true positives and have not predicted any false negatives (recall_score equals 1). However, we have predicted one false positive in the second observation that lead to precision_score equal ~0.93.

As both precision_score and recall_score are not zero with weighted parameter, f1_score, thus, exists. I believe your case is invalid due to lack of information in the example.

edited Oct 17 '17 at 13:02

answered Oct 13 '17 at 16:38

E.Z

1,958
1
18
27

hi my array with np.zeros((1,5)) has the shape (1,5) i just wrote a comment to give an example how one sample looks like but it is actual the form like this [[1,0,0,0,0]...]. The problem is that f1_score works with average="micro"/"macro" but it does not with "weighted". So my question is does "weighted" option doesn't work with multilabel or do I have to set other options like labels/pos_label in f1_score function. – KyleReemoN- Oct 16 '17 at 11:20
Read the answer, please. You cannot work with a target variable which shape is (1, 5). In this case, your `f1_score` does not work even with 'micro' or 'macro' averaging. – E.Z Oct 16 '17 at 11:38
When I use ravel to get the shape (5,) it uses one value as one sample so it does not work for multilabel e.g. when I try this shape with average="samples" I get the error "Sample-based precision, recall, fscore is not meaningful outside multilabel classification." I get working results for the shape (1,5) for micro and macro (and they are correct) the only problem is for the option average="weighted" – KyleReemoN- Oct 17 '17 at 11:47
Try to add up data. That would lead the metric to be correctly calculated. – E.Z Oct 17 '17 at 13:03
@E.Z. can you take a look this question : https://stackoverflow.com/questions/59195168/multilabel-multiclass-accuracy-how-to-calculate-accuracy-for-multiclass-mult – Aaditya Ura Dec 05 '19 at 12:49

How To Calculate F1-Score For Multilabel Classification?

1 Answers1

Linked