What prediction format should be the input for ROC function

Question

I am trying to calculate the ROC of a target variable that is binary(0,1) versus a decision tree prediction.

When I set the prediction value to be binary, it gives me the following error:

> roc(as.numeric(pred),as.numeric(data$target))

Setting levels: control = 0, case = 1
Setting direction: controls < cases

When I set the prediction value to be a probability, it gives me the following error:

> roc(pred[,2],as.numeric(data$target))

'response' has more than two levels. Consider setting 'levels' 
explicitly or using 'multiclass.roc' insteadSetting levels: 
control = 0.166666666666667, case = 0.232876712328767
Setting direction: controls < cases

So I am confused about what format should I set to the prediction to so that the ROC is calculated correctly? Why is my function showing these errors?

Roc isn't a base R function. There are packages providing it, which one are you using? — Calimo, May 27 '20 at 05:30
I have been stuck on this item so for long :( still cannot figure out what type of prediction should be the input for roc() function. — Yovo Zhang, May 28 '20 at 00:38

score 0 · Accepted Answer · answered May 29 '20 at 12:49

0

If you look at pROC's roc function documentation, you will see that the formal definition has the following form:

## Default S3 method:
roc(response, predictor, [...]

The prediction is therefore the second argument, not the first as you are using. Therefore your call should look like:

roc(data$target, pred[,2])

If you forget the order you can always use named argument in order to ignore the order:

roc(predictor = pred[,2], response = data$target)

Also note it is not necessary and even not recommended to convert the response to a numeric vector, so I removed as.numeric from the calls above.

answered May 29 '20 at 12:49

Calimo

7,510
4
39
61

Thanks Calimo for your answer! I have fixed my code according to your comment. However I realized the AUC calculated differs by the type of prediction input. Why is that? – Yovo Zhang May 29 '20 at 19:24
For example: - prediction is binary : Data: as.numeric(pred) in 207 controls (tf$var 0) < 267 cases (tf$var 1). AUC: 0.676 - prediction is probability: Data: pred[, 2] in 207 controls (tf$var 0) < 267 cases (tf$var 1). AUC: 0.692 – Yovo Zhang May 29 '20 at 19:31
You shouldn't use binary predictions if you intend to do a ROC analysis. The ROC visit all thresholds, which is not possible if you've already binarized your predictions. As a result, a lower AUC is expected. Use the full range of probabilities or whatever score you have for ROC analysis. See this answer for more details: https://stats.stackexchange.com/a/372977/36682 (here "contingency table" is basically the same as binary prediction) – Calimo May 29 '20 at 20:04

What prediction format should be the input for ROC function

1 Answers1