-1

After using logistic regression to predict stuff, this is the confusion matrix I got:

True Positives: 3
False Positives: 1309
True Negatives: 12361
False Negatives: 4

The roc_auc_score is here:

roc_auc_score(y_test, log_preds)
0.6664071480823492

So I want to visualize it using this chunk of code:

probas = lg.predict_proba(X_test)[:, 1]
def get_preds(threshold, probabilities):
    return [1 if prob > threshold else 0 for prob in probabilities]
roc_values = []
for thresh in np.linspace(0, 1, 100):
    preds = get_preds(thresh, probas)
    tn, fp, fn, tp = confusion_matrix(y_test, log_preds).ravel()
    tpr = tp/(tp+fn)
    fpr = fp/(fp+tn)
    roc_values.append([tpr, fpr])
tpr_values, fpr_values = zip(*roc_values)
fig, ax = plt.subplots(figsize=(10,7))
ax.plot(fpr_values, tpr_values)
ax.plot(np.linspace(0, 1, 100),
         np.linspace(0, 1, 100),
         label='baseline',
         linestyle='--')
plt.title('Receiver Operating Characteristic Curve', fontsize=18)
plt.ylabel('TPR', fontsize=16)
plt.xlabel('FPR', fontsize=16)
plt.legend(fontsize=12);

Below is the output, which only has one baseline, I don't understand. (My reputation still not enough to embed an image, please feel free to edit it. Thanks!)

This is the output of ROAUC plot

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Shane Li
  • 73
  • 1
  • 9

1 Answers1

1

Ok, now I kinda figured out what's going on.

I wrote down this line of code to see what happened:

print(tpr_values)
print(fpr_values)

The out put:

(0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855, 0.42857142857142855)
(0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006, 0.09575713240673006)

All of the values are the same. So they all concentrate on the same points.

I solved my own problem. The output is normal now:

The image I got

The bug was here:

roc_values = []
for thresh in np.linspace(0, 1, 100):
    preds = get_preds(thresh, probas)
    tn, fp, fn, tp = confusion_matrix(y_test, log_preds).ravel()
    tpr = tp/(tp+fn)
    fpr = fp/(fp+tn)
    roc_values.append([tpr, fpr])
tpr_values, fpr_values = zip(*roc_values)

After I fixed the log_preds , it looks like this:

roc_values = []
for thresh in np.linspace(0, 1, 100):
    preds = get_preds(thresh, probas)       #~~~~~~~~
    tn, fp, fn, tp = confusion_matrix(y_test, preds).ravel()
    tpr = tp/(tp+fn)                        #^^^^^^^^
    fpr = fp/(fp+tn)
    roc_values.append([tpr, fpr])
tpr_values, fpr_values = zip(*roc_values)

It's quite frustrating, but whatever, it finally worked.

Jeremy Caney
  • 7,102
  • 69
  • 48
  • 77
Shane Li
  • 73
  • 1
  • 9