Youden’s J statistic
J = Sensitivity + Specificity – 1
J = Sensitivity + (1 – FalsePositiveRate) – 1
J = TruePositiveRate – FalsePositiveRate
Goal is to get - > Maximum TPR and Minimum FPR
fpr, tpr, thresholds = roc_curve(y_true=y_test,y_score=y_score)
idx = np.argmax(tpr - fpr)
print(f" threshold is {thresholds[idx]} and fpr is {fpr[idx]} and tpr is {tpr[idx]}")
threshold is 0.2578948736190796 and fpr is 0.19498432601880877 and tpr is 0.7580246913580246
Confusion Matrix at this threshold is
y_pred = (y_score >thresholds[idx]).astype(np.float32)
cm = confusion_matrix(y_test,y_pred)
cm
array([[1290, 305],
[ 100, 305]], dtype=int64)
I have calculated accuracy at each threshold and at maximum accuracy i found out confusion matrix is much better than at this threshold
accuracies =[]
for i in thresholds:
y_z =( y_pred>i).astype(int)
accuracies.append(accuracy_score(y_test,y_z))
idx=np.argmax(accuracies)
print(accuracies[idx])
print(thresholds[idx])
Accuracy -> 0.8655 Threshold at 0.6422194
Confusion Matrix at this Threshold is
array([[1555, 40],
[ 227, 178]], dtype=int64)
Accuracy and TPR is much higher than previous and FPR is also low
I calculated Different accuracies at each threshold because threshold using Youden's index accuracy
is too much low.
It doesn't make sense beacuse
The best evaluation of model will be at threshold where TPR - FPR is max. and that's what Youden's index is about.
-> See the Differences in confusion matrix
one calculated by Youden's index and the one from each threshold.
So i conclude return parameters from roc_curve may be are not accurate.
Is there any justification? or where did i go wrong?