39

I have tried many examples with F1 micro and Accuracy in scikit-learn and in all of them, I see that F1 micro is the same as Accuracy. Is this always true?

Script

from sklearn import svm
from sklearn import metrics
from sklearn.cross_validation import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import f1_score, accuracy_score

# prepare dataset
iris = load_iris()
X = iris.data[:, :2]
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# svm classification
clf = svm.SVC(kernel='rbf', gamma=0.7, C = 1.0).fit(X_train, y_train)
y_predicted = clf.predict(X_test)

# performance
print "Classification report for %s" % clf
print metrics.classification_report(y_test, y_predicted)

print("F1 micro: %1.4f\n" % f1_score(y_test, y_predicted, average='micro'))
print("F1 macro: %1.4f\n" % f1_score(y_test, y_predicted, average='macro'))
print("F1 weighted: %1.4f\n" % f1_score(y_test, y_predicted, average='weighted'))
print("Accuracy: %1.4f" % (accuracy_score(y_test, y_predicted)))

Output

Classification report for SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma=0.7, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)
             precision    recall  f1-score   support

          0       1.00      0.90      0.95        10
          1       0.50      0.88      0.64         8
          2       0.86      0.50      0.63        12

avg / total       0.81      0.73      0.74        30

F1 micro: 0.7333

F1 macro: 0.7384

F1 weighted: 0.7381

Accuracy: 0.7333

F1 micro = Accuracy

Just life
  • 585
  • 1
  • 4
  • 10

4 Answers4

39

In classification tasks for which every test case is guaranteed to be assigned to exactly one class, micro-F is equivalent to accuracy. It won't be the case in multi-label classification.

shahensha
  • 2,051
  • 4
  • 29
  • 41
  • 5
    In classifying imbalanced dataset, accuracy doesn't make sense but micro F1 won't make sense either (because they have the same value)?? And I read somewhere that micro F1 should be used instead of macro F1 for imbalanced dataset. How does this all tally? – Bikash Gyawali May 11 '19 at 21:55
  • 3
    @bikashg You are right. Micro F1 doesn't make sense for the same reason that accuracy won't make sense. Did you read it in a paper? Can you please link it? – shahensha Sep 18 '19 at 23:07
  • Thank you! Im doing a 3-class classification problem and meet this. Is there any proof for this? – user900476 Jan 19 '23 at 02:31
10

This is because we are dealing with a multi class classification , where every test data should belong to only 1 class and not multi label , in such case where there is no TN , we can call True Negatives as True Positives.

Formula wise ,

enter image description here

correction : F1 score is 2* precision* recall / (precision + recall)

enter image description here

  • 3
    But TPs are almost never equal to TNs. Take [this](https://towardsdatascience.com/micro-macro-weighted-averages-of-f1-score-clearly-explained-b603420b292f) example, the TNs of Airplane, Boat, and Car will be 6, 6, and 4 respectively. And sum of TPs for all 3 classes (6) is not equal to sum of TNs for all 3 classes (16). Where am I wrong? – Ritwik Apr 15 '22 at 12:00
  • Hi, I think your definition of recall is incorrect. the correct statement is: Recall = TP / (TP + FN); same applies to your definition of precision, they are the other way around. – minggli Nov 16 '22 at 15:06
  • The definition of Precision of Recall given above are wrong. Take a look at the definition at https://en.wikipedia.org/wiki/Precision_and_recall. To understand why "micro f1" equals "accuracy" in a multi-class classification setting, take a look at https://scikit-learn.org/stable/modules/model_evaluation.html#multiclass-and-multilabel-classification – lenhhoxung Feb 15 '23 at 09:14
7

Micoaverage precision, recall, f1 and accuracy are all equal for cases in which every instance must be classified into one (and only one) class. A simple way to see this is by looking at the formulas precision=TP/(TP+FP) and recall=TP/(TP+FN). The numerators are the same, and every FN for one class is another classes's FP, which makes the denominators the same as well. If precision = recall, then f1 will also be equal.

For any inputs should should be able to show that:

from sklearn.metrics import accuracy_score as acc
from sklearn.metrics import f1_score as f1
f1(y_true,y_pred,average='micro')=acc(y_true,y_pred)
jasperr
  • 71
  • 1
  • 1
5

I had the same issue so I investigated and came up with this:

Just thinking about the theory, it is impossible that accuracy and the f1-score are the very same for every single dataset. The reason for this is that the f1-score is independent from the true-negatives while accuracy is not.

By taking a dataset where f1 = acc and adding true negatives to it, you get f1 != acc.

>>> from sklearn.metrics import accuracy_score as acc
>>> from sklearn.metrics import f1_score as f1
>>> y_pred = [0, 1, 1, 0, 1, 0]
>>> y_true = [0, 1, 1, 0, 0, 1]
>>> acc(y_true, y_pred)
0.6666666666666666
>>> f1(y_true,y_pred)
0.6666666666666666
>>> y_true = [0, 1, 1, 0, 1, 0, 0, 0, 0]
>>> y_pred = [0, 1, 1, 0, 0, 1, 0, 0, 0]
>>> acc(y_true, y_pred)
0.7777777777777778
>>> f1(y_true,y_pred)
0.6666666666666666
T. Short
  • 3,481
  • 14
  • 30
Patrick
  • 53
  • 1
  • 3
  • For other people, this doesn't discuss the question. The question is about relation between micro f1 and accuracy in multiclass problems. f1_score just returns the per-class f1 of the label with `pos_label=1`. – MoeNeuron Apr 01 '23 at 20:59