4

ROC curves plot TPR vs. FPR and vary the thresholds based on the rank order of the probabilities of the training set. The threshold that is picked is the probability associated with the point in the top left hand most corner. That basically maximizes the TPR and minimizes the false positive rate.

However, lets say my application talks about minimizing false negative rate? Then how would this curve change? How about a balance between the two?

serv-inc
  • 35,772
  • 9
  • 166
  • 188
SriK
  • 1,011
  • 1
  • 15
  • 29
  • ROC curves don't pick any threshold, they just display their TPR vs FPR. There's no built-in threshold picker. – Calimo Apr 05 '16 at 06:39
  • 1
    The threshold in the top left corner does not minimize the FPR versus the FNR. It is just a point with a good balance of TP and FP: if you wanted to have *no* FP you could set your threshold to 0, but then you would have no TP either (and many FN, and a pretty useless classifier anyway). If you want that your FN have more weight than your FP, i.e. that you penalize them more, just use a loss matrix when fitting your classifier. Not many libraries accept it though. What kind of model are you using? – lrnzcig Apr 05 '16 at 14:23
  • Thank you for your comments. It was just a question I had independent of the model. Some applications have a higher weight on the false negatives. I will look into applying a loss matrix. But from your comment, the ROC plot would not help balance FN's. Perhaps if I plotted TN and FN and get that AUC and weight the two AUC's together ? – SriK Apr 06 '16 at 11:12

2 Answers2

9

It seems to me that you are somewhat misunderstanding what a ROC curve is.

A ROC curve plots TPR vs FPR as threshold is varied. As a result, ROC curves are actually 3-dimensional graphs, plotting the relationship between 3 variables: FPR, TPR, and Threshold. Each point on the graph reflects what the actual TPR and FPR are for a specific threshold value. The lower-left corner of the graph always reflects a threshold of 1, while the upper-right corner reflects a threshold of 0.

ROC curves have two usual uses: to compare two different models independent of threshold, and to help select the proper threshold. The "proper threshold" for a predictive analytics application will vary quite a bit based on the specific problem that you are attacking, but in general, you can use a ROC curve to pick a threshold with an acceptible TPR/FPR tradeoff for your specific application. It is rarely the case that simply picking the threshold for the point closest to the upper left corner will give the ideal outcome.

Once you pick a threshold that seems ideal from the ROC curve, you can investigate the confusion matrix and other evaluation metrics (precision, recall, accuracy, F1, etc.) to assess the threshold further.

To answer your direct question, you are correct that ROC curves don't directly show the FNR. In this case, you may want to use a Sensitivity/Specificity graph, which plots TPR vs TNR in a similar manner to the ROC curve. There is no standard evaluation method that I know of which looks directly at FNR. Instead, I usually just switch the "positive" and "negative" labels in my data and replot the ROC curve. This gives (effectively) TNR vs FNR.

4

However, lets say my application talks about minimizing false negative rate? Then how would this curve change?

This curve would stay exactly the same. But you would no longer choose the top left point (left circle in picture). Instead, you would try to maximize true positive rate (1-FNR). This would be truly maximized if you assign all points as positive. As this defies classification (=is stupid), you would choose a point closer to the top right of the ROC (right in picture).

How about a balance between the two?

A point between the top left and top right (middle in picture)

three points on roc curve

serv-inc
  • 35,772
  • 9
  • 166
  • 188