What is the threshold for the sklearn roc_auc_score

Question

In my classification problem, I want to check whether my model has performed good, so i did a roc_auc_score to find the accuracy and got the value 0.9856825361839688

my question

this is my code

x,y=make_classification(n_samples=2000,n_classes=2,weights=[1,1],random_state=24)
x_train, x_test, y_train, y_test=train_test_split(x,y,test_size=0.3,random_state=43)


from sklearn.neighbors import KNeighborsClassifier
knn_classifier=KNeighborsClassifier()
knn_classifier.fit(x_train, y_train)
ytrain_pred = knn_classifier.predict_proba(x_train)
print('train roc-auc: {}'.format(roc_auc_score(y_train, ytrain_pred[:,1])))

train roc-auc: 0.9856825361839688

now i do a roc-auc plot to check the best score

fpr_1, tpr_1, thresholds_1=roc_curve(y_train, ytrain_pred[:,1])
fig,ax=plt.subplots(1,1,figsize=(15,7))
g=sns.lineplot(x=fpr_1,y=tpr_1,ax=ax,color='green')
g.set_xlabel('False Positive Rate')
g.set_ylabel('True Positive Rate')
g.set(xlim=(0,0.8))

From the plot i can visually see that TPR is at the maximum starting from the 0.2(FPR), so from the roc_auc_score which i got , should i think that the method took 0.2 as the threshold

I explicitly calculated the accuracy score for each threshold

_result=pd.concat([pd.Series(thresholds_1),pd.Series(accuracy_ls)],axis=1)
_result.columns=['threshold','accuracy score']

so, should i think that the roc_auc_score gives the highest score no matter what is the threshold is?

Which operating point (threshold) is best depends on your application. What's worse: False positives or false negatives? — couka, Feb 25 '21 at 07:49

score 3 · Answer 1 · edited Feb 25 '21 at 08:32

The method roc_auc_score is used for evaluation of the classifier. It tells you the area under the roc curve. (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html)

roc_auc_score == 1 - ideal classifier.

For binary classification with an equal number of samples for both classes in the evaluated dataset: roc_auc_score == 0.5 - random classifier.

In this method we don't compare thresholds between each other.

Which threshold is better, you should decide yourself, depending on the business problem you are trying to solve. What is more important for you precision or recall?

What is the threshold for the sklearn roc_auc_score

1 Answers1