4

I have trained a binary classifier, but I think that my ROC curve is incorrect.

This is the vector that contains labels:

y_true= [0, 1, 1, 1, 0, 1, 0, 1, 0]

and the second vector is the score vector

y_score= [
    0.43031937, 0.09115553, 0.00650781, 0.02242869, 0.38608587, 
    0.09407699, 0.40521139, 0.08062053, 0.37445426
]

When I plot my ROC curve, I get the following:

enter image description here

I think the code is correct, but I don't understand why I'm getting this curve and why the tpr, fpr, and threshold lists are of length 4. Why is my AUC is equal to zero?

fpr [0.   0.25 1.   1.  ]
tpr [0. 0. 0. 1.]
thershold [1.43031937 0.43031937 0.37445426 0.00650781]

My Code:

import sklearn.metrics as metrics

fpr, tpr, threshold = metrics.roc_curve(y_true, y_score)
roc_auc = metrics.auc(fpr, tpr)

# method I: plt
import matplotlib.pyplot as plt
plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()
Guizmo Charo
  • 71
  • 2
  • 10

2 Answers2

4

One thing to keep in mind about AUC is that what's really important is distance from 0.5. If you have a really low AUC, that just means that your "positive" and "negative" labels are switched.

Looking at your scores, it's clear that a low score (anything less than ~0.095) means a 1 and anything above that threshold is a 0. So you actually have a great binary classifier!

The problem is that by default, higher scores are associated with the label 1. So you're labeling points with high scores as 1's instead of 0's. Thus you're wrong 100% of the time. In that case, just switch your predictions and you'll be correct 100% of the time.

The simple fix is to use the pos_label argument to sklearn.metrics.roc_curve. In this case you want your positive label to be 0.

fpr, tpr, threshold = metrics.roc_curve(y_true, y_score, pos_label=0)
roc_auc = metrics.auc(fpr, tpr)
print(roc_auc)
#1.0
pault
  • 41,343
  • 15
  • 107
  • 149
  • thank's for you reply , i think that my score and labels are wrong i I just posted my full code ,for score i puted the cosine similarity for scores and for label i puted the value 0 and 1 . what i mean about 0 and 1 is if two faces are similar so i put the value equal to 1 if two faces are different label equal to 0 so i get a vector of 0 1 for predection and a vector of scores that contains cosine similiarity so i'm doing an indirectly classification of two input faces hope you will answer my question @pault – Guizmo Charo Feb 14 '19 at 22:35
1

What @pault stated is misleading

If you have a really low AUC, that just means that your "positive" and "negative" labels are switched.

AUC=0 implies that

  • all truly positive data points are classified as negative or
  • all truly negative data points are classified as positive.

AUC=1 implies that there is a threshold, that can perfectly separate the data.

Talos
  • 457
  • 4
  • 15