Difference between sklearn.roc_auc_score() and sklearn.plot_roc_curve()

Question

I'd like to evaluate my machine learning model. I computed the area under the ROC curve with roc_auc_score() and plotted the ROC curve with plot_roc_curve() functions of sklearn. In the second function the AUC is also computed and shown in the plot. Now my problem is, that I get different results for the two AUC.

Here's the reproducible code with sample dataset:

import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import plot_roc_curve
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import MinMaxScaler

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

scaler = MinMaxScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

model = MLPClassifier(random_state=42)
model.fit(X_train, y_train)
yPred = model.predict(X_test)

print(roc_auc_score(y_test, yPred))
plot_roc_curve(model, X_test, y_test)
plt.show()

The roc_auc_score function gives me 0.979 and the plot shows 1.00. Despite the fact that the second function takes the model as an argument and predicts yPred again, the outcome should not differ. It is not a round off error. If I decrease training iterations to get a bad predictor the values still differ.

With my real dataset I "achieved" a difference of 0.1 between the two methods. How does this aberration come?

Reveille · Answer 1 · 2020-12-11T15:53:23.597

You should pass the prediction probabilities to roc_auc_score, and not the predicted classes. Like this:

yPred_p = model.predict_proba(X_test)[:,1]
print(roc_auc_score(y_test, yPred_p))

# output: 0.9983354140657512

When you pass the predicted classes, this is actually the curve for which AUC is being calculated (which is wrong):

Code to regenerate:

from sklearn.metrics import roc_curve, auc
fpr, tpr, _ = roc_curve(y_test, yPred)
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label='AUC = ' + str(round(roc_auc, 2)))
plt.legend(loc='lower right')
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')

Difference between sklearn.roc_auc_score() and sklearn.plot_roc_curve()

1 Answers1