Appropriate f1 scoring for highly imbalanced data

Question

I am confused with three different f1 computation. Which f1 scoring I should use for a severely imbalanced data? I am working on a severely imbalanced binary classification.

‘f1’
‘f1_micro’
‘f1_macro’
‘f1_weighted’

Also, I want to add balanced_accuracy_score(y_true, y_pred, adjusted=True) in balanced_accuracy scoring argument. How can I incorporate this in my code?

from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from imblearn.metrics import geometric_mean_score
X, y = load_breast_cancer(return_X_y=True)

gm_scorer = make_scorer(geometric_mean_score, greater_is_better=True)
scores = cross_validate(LogisticRegression(max_iter=100000),X,y, cv=5,scoring={'gm_scorer': gm_scorer, 'F1': 'f1', 'Balanced Accuracy': 'balanced_accuracy'}
)
scores

f1_micro is global f1 score, if you want to worry about how f1 is affected by each class (with imbalance), you would want to use `f1_macro` or `f1_weighted`. More details in my answer below. — Akshay Sehgal, Apr 06 '21 at 19:19
I’m voting to close this question because it is not about programming as defined in the [help] but about ML theory and/or methodology - please see the intro and NOTE in the `machine-learning` [tag info](https://stackoverflow.com/tags/machine-learning/info). — desertnaut, Apr 07 '21 at 11:51

score 1 · Answer 1 · edited Apr 07 '21 at 12:00

f1_micro is for global f1, while f1_macro takes the individual class-wise f1 and then takes an average.

Its similar to precision and its micro, macro, weights parameters in sklearn. Do check the SO post Type of precision where I explain the difference. f1 score is basically a way to consider both precision and recall at the same time.

Also, as per documentation:

'micro': Calculate metrics globally by counting the total true positives, false negatives and false positives.

'macro': Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

'weighted': Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.

For your specific case, you might want to use f1_macro (unweighted average of class-wise f1) or f1_weighted (weights average of class-wise f1), as f1_micro will high the class-wise contribution to the f1.

I got a pretty strange result when I used f1_weighted measure. I posted a separate question to see the differences of popular scoring metrics that are used in imbalanced data domain. — ForestGump, Apr 08 '21 at 19:09

Appropriate f1 scoring for highly imbalanced data

1 Answers1