How to integrate G-mean in cross_validate sklearn function?

Question

from sklearn.model_selection import cross_validate
scores = cross_validate(LogisticRegression(class_weight='balanced',max_iter=100000),
                        X,y, cv=5, scoring=('roc_auc', 'average_precision','f1','recall','balanced_accuracy'))
scores['test_roc_auc'].mean(), scores['test_average_precision'].mean(),scores['test_f1'].mean(),scores['test_recall'].mean(),scores['test_balanced_accuracy'].mean()

How can I compute the following G-mean under the above cross-validate scoring parameter:

from imblearn.metrics import geometric_mean_score
print('The geometric mean is {}'.format(geometric_mean_score(y_test, y_test_pred)))

or

from sklearn.metrics import accuracy_score
g_mean = 1.0
    #
for label in np.unique(y_test):
    idx = (y_test == label)
    g_mean *= accuracy_score(y_test[idx], y_test_pred[idx])
    #
g_mean = np.sqrt(g_mean)
score = g_mean
print(score)

Miguel Trejo · Accepted Answer · 2021-04-06T02:28:06.053

Just pass it as a custom scorer

from sklearn.metrics import make_scorer
from imblearn.metrics import geometric_mean_score

gm_scorer = make_scorer(geometric_mean_score, greater_is_better=True, average='binary')

setting greater_is_better=True as best values are closer to 1. Additional arguments to geometrics_mean_score can be passed directly to make_scorer

Full Example

from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from imblearn.metrics import geometric_mean_score

X, y = load_breast_cancer(return_X_y=True)

gm_scorer = make_scorer(geometric_mean_score, greater_is_better=True)

scores = cross_validate(
    LogisticRegression(class_weight='balanced',max_iter=100000),
    X,y, 
    cv=5, 
    scoring=gm_scorer
)
scores
>>>
{'fit_time': array([0.76488066, 0.69808364, 1.22158527, 0.94157672, 1.01577377]),
 'score_time': array([0.00103951, 0.00100923, 0.00065804, 0.00071168, 0.00068736]),
 'test_score': array([0.91499142, 0.93884403, 0.9860133 , 0.92439026, 0.9525989 ])}

EDIT

To specify multiple metrics, pass a dict to the scoring parameter

scores = cross_validate(
    LogisticRegression(class_weight='balanced',max_iter=100000),
    X,y, 
    cv=5, 
    scoring={'gm_scorer': gm_scorer, 'AUC': 'roc_auc', 'Avg_Precision': 'average_precision'}
)
scores
>>>
{'fit_time': array([1.03509665, 0.96399784, 1.49760461, 1.13874388, 1.32006526]),
 'score_time': array([0.00560617, 0.00357151, 0.0057447 , 0.00566769, 0.00549698]),
 'test_gm_scorer': array([0.91499142, 0.93884403, 0.9860133 , 0.92439026, 0.9525989 ]),
 'test_AUC': array([0.99443171, 0.99344907, 0.99801587, 0.97949735, 0.99765258]),
 'test_Avg_Precision': array([0.99670544, 0.99623085, 0.99893162, 0.98640759, 0.99861043])}

Why did u use average='binary'? Also, Is it possible to use 'roc_auc', 'average_precision' together with gm_scorer? I tried to use scoring=('gm_scorer', 'roc_auc', 'average_precision'), but it didn't work! — ForestGump, Apr 05 '21 at 14:47
@ForestGump yes, you can pass multiple metrics (one of which is a custom function) through a dictionary to `scoring` — Miguel Trejo, Apr 06 '21 at 02:28

score 0 · Answer 2 · edited Apr 02 '21 at 14:47

0

You need to make a custom scorer, here's an example : https://stackoverflow.com/a/53850851/12384070 Then, if it's the only scorer you want, you can do :

scores = cross_validate(LogisticRegression(class_weight='balanced',max_iter=100000),
                        X,y, cv=5, scoring=your_custom_function)

I think you can use the other scorer, as explained in the doc :

If scoring reprents multiple scores, one can use:

a list or tuple of unique strings;

a callable returning a dictionary where the keys are the metric names and the values are the metric scores;

a dictionary with metric names as keys and callables a values.

edited Apr 02 '21 at 14:47

Dharman

30,962
25
85
135

answered Apr 02 '21 at 14:42

SashimiDélicieux

476
1
5
17

Do I need to pass this **kwargs in the custom function parameters? – ForestGump Apr 02 '21 at 14:48
1

Your function definition should look like this : def geometric_mean_score(y_test, y_pred, **kwargs). Then you have to pass it to make scorer this way : make_scorer(geometric_mean_score). This will output your custom scorer, and you should be able to put it in the cross_validate function – SashimiDélicieux Apr 03 '21 at 12:02

How to integrate G-mean in cross_validate sklearn function?

2 Answers2

EDIT