How to calculate average classification report across all folds?

Question

I am trying to do a binary class classification. Since I have a small dataset (275 samples), I have done Leave-one-out cross validation and want to get the average classification report and AUROC/AUPRC across all folds.

I have closely followed this link to arrive at my results, but I cannot understand what the code is doing in the last line.

for i in classifiers:
    print(i)
    originalclass = []
    predictedclass = []
    model=i
    loo = LeaveOneOut()
    print('Scores before feature selection')
    scores = cross_val_score(model, subset, y,cv=loo,scoring=make_scorer(classification_report_with_accuracy_score))
    print("CV score",np.mean(cross_val_score(model,subset,y,cv=loo,scoring='roc_auc')))
    print(classification_report(originalclass, predictedclass))
    print('Scores after feature selection')
    X_reduced=feature_reduction_using_RFECV(model,subset,y)
    scores = cross_val_score(model, X_reduced, y,cv=loo,scoring=make_scorer(classification_report_with_accuracy_score))
    print("CV score",np.mean(cross_val_score(model,X_reduced,y,cv=loo,scoring='roc_auc')))
    print(classification_report(originalclass, predictedclass))

Where exactly is the averaging happening in the above code? I am calculating the mean CV score and printing it. But the line after that confuses me the most. I am initializing originalclass and predictedclass variable in the beginning,but where is it being used before printing in the last line?

print(classification_report(originalclass, predictedclass))

Edited code

for i in classifiers:
    print(i)
    originalclass = y
    model=i
    loo = LeaveOneOut()
    print('Scores before feature selection')
    y_pred = cross_val_predict(model, subset, y, cv=loo)
    print(classification_report(originalclass, y_pred))
    print("CV score",np.mean(cross_val_score(model,subset,y,cv=loo,scoring='roc_auc')))
    print(classification_report(originalclass, y_pred))
    print('Scores after feature selection')
    X_reduced=feature_reduction_using_RFECV(model,subset,y)
    y_pred = cross_val_predict(model, X_reduced, y, cv=loo)
    classification_report(originalclass, y_pred)
    print("CV score",np.mean(cross_val_score(model,X_reduced,y,cv=loo,scoring='roc_auc')))
    print(classification_report(originalclass, y_pred))

Your question does not make any sense; you can certainly average CV scores (here with `np.mean`), but you *cannot* "average" classification reports. Please see the [docs](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html) to understand exactly what a classification report actually is — desertnaut, Aug 06 '19 at 11:14
In cross validation we train on every n-1 folds and test on the nth fold. As a result for every run, we get a confusion matrix and consequently the classification report. Can I not average the sensitivity/specificity etc for every fold? — bandit_king28, Aug 06 '19 at 11:24

seralouk · Accepted Answer · 2019-08-06T11:34:28.620

3

When you use

print("CV score",np.mean(cross_val_score(model,X_reduced,y,cv=loo,scoring='roc_auc')))

you print the average cross-validated roc_auc metric of your model under the cv i.e. LeaveOneOut scheme.

The next command:

print(classification_report(originalclass, predictedclass))

is used to print the full classification report and not only the average roc_auc metric as in the previous line.

This command takes as input arguments the following:

classification_report(y_true, y_pred)

y_true is originalclass for you, the ground truth and y_pred should be the predicted cross-validated labels/classes.

You should have something like this:

y_pred = cross_val_predict(model, X_reduced, y, cv=loo)
classification_report(originalclass, y_pred)

Now, the y_pred is already cross-validated predictions of the labels so the classification report will print the cross-validated results in terms of the classification metrics.

Toy example to illustrate the above:

from sklearn.metrics import classification_report

originalclass = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
print(classification_report(originalclass, y_pred))

          precision    recall  f1-score   support

       0       0.50      1.00      0.67         1
       1       0.00      0.00      0.00         1
       2       1.00      0.67      0.80         3

 micro avg       0.60      0.60      0.60         5
 macro avg       0.50      0.56      0.49         5
 weighted avg       0.70      0.60      0.61         5

edited Aug 06 '19 at 11:34

answered Aug 06 '19 at 10:38

seralouk

30,938
9
118
133

Thanks for your reply. So just to clarify, the code that I pasted above doesn't calculate the average classification report averaged across all folds? – bandit_king28 Aug 06 '19 at 11:01
1

your initial code will not work at all. Where is `originalclass, predictedclass` constructed? They are empty – seralouk Aug 06 '19 at 11:11
now it seems okay. `originalclass = y` can be outside the loop of course. consider upvoting and accepting my answer – seralouk Aug 06 '19 at 11:23
So finally we are predicting our labels using the cross validated model and getting the classification report, right? – bandit_king28 Aug 06 '19 at 11:30
exactly. `cross_val_predict` predicts the labels using cross validations. – seralouk Aug 06 '19 at 11:33
+1 for your help. I am also trying to run this command `print(np.mean(cross_val_score(model,subset,y,cv=loo,scoring='roc_auc')))` but I am getting an error `ValueError: Only one class present in y_true. ROC AUC score is not defined in that case`. – bandit_king28 Aug 06 '19 at 11:36
does `y` contain more than one class/label? It should because AUC is defined for a least binary problems (2 labels/classes). – seralouk Aug 06 '19 at 11:45
`y.value_counts()` is binary only class 0: 223 class 1: 52 – bandit_king28 Aug 06 '19 at 11:49
`y` should be a numpy array list `[0,0,0,1,1,1]` containing the labels for each sample – seralouk Aug 06 '19 at 12:38

How to calculate average classification report across all folds?

1 Answers1