3

I am trying to calculate accuracy using ROCR package in R but the result is different than what I expected:

Assume I have prediction of a model (p) and label (l) as following:

p <- c(0.61, 0.36, 0.43, 0.14, 0.38, 0.24, 0.97, 0.89, 0.78, 0.86)
l <- c(1,     1,    1,    0,    0,     1,    1,    1,    0,     1)

And I am calculating accuracy of this prediction using following commands:

library(ROCR)
pred <- prediction(p, l)
perf <- performance(pred, "acc")
max(perf@y.values[[1]])

but the result is .8 which according to accuracy formula (TP+TN)/(TN+TP+FN+FP) should be .6 I don't know why?

Soroosh
  • 477
  • 2
  • 7
  • 18

1 Answers1

4

When you use max(perf@y.values[[1]]), it is computing the maximum accuracy across any possible cutoff for predicting a positive.

In your case, the optimal threshold is p=0.2, at which you make 2 mistakes (on the observations with predicted probabilities 0.38 and 0.78), yielding a maximum accuracy of 0.8.

You can access the cutoffs for your perf object using perf@x.values[[1]].

josliber
  • 43,891
  • 12
  • 98
  • 133
  • Thank you for your answer. How can one get the accuracy for the `0.5` cut-off? Is it `max(perf@y.values[[0.5]])`? Also, do you know why we need `max()` in getting these? – Zhubarb Aug 27 '14 at 13:23
  • @Zhubarb with ROCR I would use `perf@y.values[[1]][max(which(perf@x.values[[1]] >= 0.5))]`. – josliber Aug 27 '14 at 14:22
  • Great, so in this specific example `perf@x.values= Inf 0.97 0.89 0.86 0.78 0.61 0.43 0.38 0.36 0.24 0.14` and `perf@y.values=0.3 0.4 0.5 0.6 0.5 0.6 0.7 0.6 0.7 0.8 0.7`. Therefore your code is returning the `acc` for `cut_off=0.61` (which is the smallest cut-off larger than 0.5), is that correct? – Zhubarb Aug 27 '14 at 14:27
  • 1
    @Zhubarb that's the accuracy for all cutoffs in the range 0.43-0.61. – josliber Aug 27 '14 at 22:59
  • I see, that makes more sense actually. I assumed the cut-offs were somewhat random. Thanks a lot! – Zhubarb Aug 28 '14 at 07:14