Use Scikit-Learn's GridSearchCV to capture precision, recall, and f1 for all permutations?

Question

I want to use Scikit-Learn's GridSearchCV to run a bunch of experiments and then print out the recall, precision, and f1 of each experiment.

This article (https://scikit-learn.org/stable/auto_examples/model_selection/plot_grid_search_digits.html) suggests that I need to run .fit and .predict multiple times.

...
scores = ['precision', 'recall']
...
for score in scores:
    ...
    clf = GridSearchCV(
        SVC(), tuned_parameters, scoring='%s_macro' % score
    )
    clf.fit(X_train, y_train) # running for each scoring metric
    ...
    for mean, std, params in zip(means, stds, clf.cv_results_['params']):
        print("%0.3f (+/-%0.03f) for %r"
              % (mean, std * 2, params))
    ...
    y_true, y_pred = y_test, clf.predict(X_test) # running for each scoring metric
    print(classification_report(y_true, y_pred))

I would like to just run .fit once and log all of the recall, precision, and f1 metrics. So for example, something along the lines of:

clf = GridSearchCV(
    SVC(), tuned_parameters, scoring=['recall', 'precision', 'f1'] # I don't think this syntax is even possible
)

clf.fit(X_train, y_train)

for metric in clf.something_that_i_cannot_find:
    ### does something like this exist?
    print(metric['precision']
    print(metric['recall'])
    print(metric['f1'])
    ###:end does something like this exist?

Or maybe even:

...
for run in clf.something_that_i_cannot_find:
    ### does something like this exist?
    print(classification_report(run.y_true, run.y_pred))
    ###:end does something like this exist?

This article (Scoring in Gridsearch CV) suggests that GridSearchCV can be made aware of multiple scorers, but I still can't figure out how to access each of those scores for all of the experiments.

Is what I'm looking not supported by GridSearchCV? Is the method used in the article (i.e. running the .fit and .predict multiple times) the easiest way to accomplish something similar to what I'm asking for?

Thank you for your time

You are gonna have to do it manually which would take a lot of code using folds from scikit learn and loop over the parameters, I would suggest to set the random state and run the grid search 3 times. — Ibrahim Sherif, Aug 27 '21 at 22:34
Thank you for the suggestion. I'll take that approach. If you want to type up your comment as an answer, I'll accept it to close the loop on this. — Zhao Li, Aug 27 '21 at 22:38

score 3 · Accepted Answer · answered Aug 28 '21 at 02:41

You can do the multiple-metric evaluation on binary classification. I encountered a ValueError: Multi-class not supported, when I was trying to implement on iris dataset.

I have implemented on basic binary data below, where I am calculating four different scores,

['AUC', 'F1', 'Precision', 'Recall']

Note: The idea is not to consume inference from the model but only to show how multiple-metric evaluation works. The data is just random data.

X, y = datasets.make_classification(n_classes=2, random_state=0)

# The scorers can be either one of the predefined metric strings or a scorer
# callable, like the one returned by make_scorer
f1_scorer = make_scorer(f1_score, average='binary')
scoring = {'AUC': 'roc_auc', 'F1': 'f1_micro', 'Precision': 'precision', 'Recall':'recall'}

# split data to train and test data
X_train, X_test, y_train, y_test =  train_test_split(X, y, test_size=0.2)

clf = GridSearchCV(
              SVC(),
              param_grid={'kernel': ['linear'], 'C': [1, 10, 100, 1000]},
              scoring=scoring,
              refit='AUC',
              return_train_score=True
               )
clf.fit(X_train, y_train)
results = clf.cv_results_


**Plotting the result**

plt.figure(figsize=(10, 10))
plt.title("GridSearchCV evaluating using multiple scorers simultaneously",
      fontsize=16)

plt.xlabel("min_samples_split")
plt.ylabel("Score")

ax = plt.gca()
ax.set_xlim(1, 1000)
ax.set_ylim(0.40, 1)

# Get the regular numpy array from the MaskedArray
X_axis = np.array(results['param_C'].data, dtype=float)

for scorer, color in zip(sorted(scoring), ['g', 'k', 'b', 'r']):
    for sample, style in (('train', '--'), ('test', '-')):
       sample_score_mean = results['mean_%s_%s' % (sample, scorer)]
       sample_score_std = results['std_%s_%s' % (sample, scorer)]
       ax.fill_between(X_axis, sample_score_mean - sample_score_std,
                    sample_score_mean + sample_score_std,
                    alpha=0.1 if sample == 'test' else 0, color=color)
       ax.plot(X_axis, sample_score_mean, style, color=color,
            alpha=1 if sample == 'test' else 0.7,
            label="%s (%s)" % (scorer, sample))

    best_index = np.nonzero(results['rank_test_%s' % scorer] == 1)[0][0]
    best_score = results['mean_test_%s' % scorer][best_index]

    # Plot a dotted vertical line at the best score for that scorer marked by x
    ax.plot([X_axis[best_index], ] * 2, [0, best_score],
        linestyle='-.', color=color, marker='x', markeredgewidth=3, ms=8)

    # Annotate the best score for that scorer
    ax.annotate("%0.2f" % best_score,
            (X_axis[best_index], best_score + 0.005))

plt.legend(loc="best")
plt.grid(False)
plt.show()

Output plot

Thank you for the example code. I’ll give it a try on Monday. Have a great weekend — Zhao Li, Aug 28 '21 at 05:19
I'm thinking/looking over the code a bit in preparation for Monday and am curious if I'm following your code correctly. Can you please confirm if the following is true or not? The `test` cases in the plot is showing the `test` results within the cross validation, so hence is a subset of the training data, e.g. `X_train` and `X_test`. The `test` cases in the plot is not showing the results of applying a model to the test data from `train_test_split`, e.g. `y_train` and `y_test`. Is my thinking correct or did I miss something in the example code? — Zhao Li, Aug 28 '21 at 18:27
@ zhao Li...Yeah you are right. The `test cases` is not the actual `X_test` and `y_test`. The `test cases` in the plot is the `validation data` split that the cross validation logic creates. Sorry if it has created a confusion over the `test` cases in the plot. — Priya, Aug 29 '21 at 04:31
Glad to help!....you are welcome!...please upvote if it works for you. — Priya, Aug 29 '21 at 04:32
The default number of cross validation folds is `cv=5` with in the `GridSearchCV` constructor. — Priya, Aug 29 '21 at 04:40
gotcha, thanks for clarifying. if you know of a way to capture the recall, precision, and f1 from the `y_test`, please let me know. Thank you for your help — Zhao Li, Aug 29 '21 at 06:41

score 1 · Answer 2 · answered Aug 27 '21 at 22:44

You are gonna have to do it manually which would take a lot of code using to loop over folds using sklearn and another multiple loops for the parameters. I would suggest to set the random state for the fold strategy, grid search and the model and run the grid search 3 times for each metric.

Use Scikit-Learn's GridSearchCV to capture precision, recall, and f1 for all permutations?

2 Answers2