1

Youden’s J statistic

J = Sensitivity + Specificity – 1

J = Sensitivity + (1 – FalsePositiveRate) – 1

J = TruePositiveRate – FalsePositiveRate

Goal is to get - > Maximum TPR and Minimum FPR

fpr, tpr, thresholds = roc_curve(y_true=y_test,y_score=y_score)
idx = np.argmax(tpr - fpr) 
print(f" threshold is {thresholds[idx]} and fpr is {fpr[idx]} and tpr is {tpr[idx]}")

threshold is 0.2578948736190796 and fpr is 0.19498432601880877 and tpr is 0.7580246913580246

Confusion Matrix at this threshold is

y_pred = (y_score >thresholds[idx]).astype(np.float32)
cm = confusion_matrix(y_test,y_pred)
cm
array([[1290,  305],
       [ 100,  305]], dtype=int64)

I have calculated accuracy at each threshold and at maximum accuracy i found out confusion matrix is much better than at this threshold

accuracies =[]
for i in thresholds:
    y_z =( y_pred>i).astype(int)
    accuracies.append(accuracy_score(y_test,y_z))
idx=np.argmax(accuracies)
print(accuracies[idx])
print(thresholds[idx])

Accuracy -> 0.8655 Threshold at 0.6422194

Confusion Matrix at this Threshold is

array([[1555,   40],
       [ 227,  178]], dtype=int64)

Accuracy and TPR is much higher than previous and FPR is also low

I calculated Different accuracies at each threshold because threshold using Youden's index accuracy is too much low.
It doesn't make sense beacuse The best evaluation of model will be at threshold where TPR - FPR is max. and that's what Youden's index is about.
-> See the Differences in confusion matrix one calculated by Youden's index and the one from each threshold.
So i conclude return parameters from roc_curve may be are not accurate.

ROC curve

Is there any justification? or where did i go wrong?

Sauron
  • 551
  • 2
  • 11
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Aug 03 '23 at 18:14
  • I am trying to find optimal threshold as it is explained in this article https://machinelearningmastery.com/threshold-moving-for-imbalanced-classification/ but accuracy from that calculated index is much low .so i calculated accuracy at every threshold, as you can see the difference between confusion matrix. – Sauron Aug 03 '23 at 19:55

0 Answers0