2

I am trying to calculate the ROC of a target variable that is binary(0,1) versus a decision tree prediction.

When I set the prediction value to be binary, it gives me the following error:

> roc(as.numeric(pred),as.numeric(data$target))

Setting levels: control = 0, case = 1
Setting direction: controls < cases

When I set the prediction value to be a probability, it gives me the following error:

> roc(pred[,2],as.numeric(data$target))

'response' has more than two levels. Consider setting 'levels' 
explicitly or using 'multiclass.roc' insteadSetting levels: 
control = 0.166666666666667, case = 0.232876712328767
Setting direction: controls < cases

So I am confused about what format should I set to the prediction to so that the ROC is calculated correctly? Why is my function showing these errors?

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Yovo Zhang
  • 23
  • 2

1 Answers1

0

If you look at pROC's roc function documentation, you will see that the formal definition has the following form:

## Default S3 method:
roc(response, predictor, [...]

The prediction is therefore the second argument, not the first as you are using. Therefore your call should look like:

roc(data$target, pred[,2])

If you forget the order you can always use named argument in order to ignore the order:

roc(predictor = pred[,2], response = data$target)

Also note it is not necessary and even not recommended to convert the response to a numeric vector, so I removed as.numeric from the calls above.

Calimo
  • 7,510
  • 4
  • 39
  • 61
  • Thanks Calimo for your answer! I have fixed my code according to your comment. However I realized the AUC calculated differs by the type of prediction input. Why is that? – Yovo Zhang May 29 '20 at 19:24
  • For example: - prediction is binary : Data: as.numeric(pred) in 207 controls (tf$var 0) < 267 cases (tf$var 1). AUC: 0.676 - prediction is probability: Data: pred[, 2] in 207 controls (tf$var 0) < 267 cases (tf$var 1). AUC: 0.692 – Yovo Zhang May 29 '20 at 19:31
  • You shouldn't use binary predictions if you intend to do a ROC analysis. The ROC visit all thresholds, which is not possible if you've already binarized your predictions. As a result, a lower AUC is expected. Use the full range of probabilities or whatever score you have for ROC analysis. See this answer for more details: https://stats.stackexchange.com/a/372977/36682 (here "contingency table" is basically the same as binary prediction) – Calimo May 29 '20 at 20:04