I have a Multiclass problem, where 0
is my negative class and 1
and 2
are positive. Check the following code:
import numpy as np
from sklearn.metrics import confusion_matrix
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.metrics import f1_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
# Outputs
y_true = np.array((1, 2, 2, 0, 1, 0))
y_pred = np.array((1, 0, 0, 0, 0, 1))
# Metrics
precision_macro = precision_score(y_true, y_pred, average='macro')
precision_weighted = precision_score(y_true, y_pred, average='weighted')
recall_macro = recall_score(y_true, y_pred, average='macro')
recall_weighted = recall_score(y_true, y_pred, average='weighted')
f1_macro = f1_score(y_true, y_pred, average='macro')
f1_weighted = f1_score(y_true, y_pred, average='weighted')
# Confusion Matrix
cm = confusion_matrix(y_true, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
plt.show()
The metrics calculated with Sklearn
in this case are the following:
precision_macro = 0.25
precision_weighted = 0.25
recall_macro = 0.33333
recall_weighted = 0.33333
f1_macro = 0.27778
f1_weighted = 0.27778
And this is the confusion matrix:
The macro
and weighted
are the same because i have the same number of samples for each class? This is what i did manually.
1 - Precision = TP/(TP+FP). So for classes 1
and 2
, we get:
Precision1 = TP1/(TP1+FP1) = 1/(1+1) = 0.5
Precision2 = TP2/(TP2+FP2) = 0/(0+0) = 0 (this returns 0 according Sklearn documentation)
Precision_Macro = (Precision1 + Precision2)/2 = 0.25
Precision_Weighted = (2*Precision1 + 2*Precision2)/4 = 0.25
2 - Recall = TP/(TP+FN). So for classes 1
and 2
, we get:
Recall1 = TP1/(TP1+FN1) = 1/(1+1) = 0.5
Recall2 = TP2/(TP2+FN2) = 0/(0+2) = 0
Recall_Macro = (Recall1+Recall2)/2 = (0.5+0)/2 = 0.25
Recall_Weighted = (2*Recall1+2*Recall2)/4 = (2*0.5+2*0)/4 = 0.25
3 - F1 = 2*(Precision*Recall)/(Precision+Recall)
F1_Macro = 2*(Precision_Macro*Recall_Macro)/(Precision_Macro*Recall_Macro) = 0.25
F1_Weighted = 2*(Precision_Weighted*Recall_Weighted)/(Precision_Weighted*Recall_Weighted) = 0.25
So, the Precision score is the same as Sklearn
. But Recall and F1 are different. What did i do wrong here? Even if you use the values of Precision and Recall from Sklearn
(i.e., 0.25
and 0.3333
), you can't get the 0.27778
F1 score.