8

I am using R v3.3.2 and Caret 6.0.71 (i.e. latest versions) to construct a logistic regression classifier. I am using the confusionMatrix function to create stats for judging its performance.

logRegConfMat <- confusionMatrix(logRegPrediction, valData[,"Seen"])

  • Reference 0, Prediction 0 = 30
  • Reference 1, Prediction 0 = 14
  • Reference 0, Prediction 1 = 60
  • Reference 1, Prediction 1 = 164

Accuracy : 0.7239
Sensitivity : 0.3333
Specificity : 0.9213

The target value in my data (Seen) uses 1 for true and 0 for false. I assume the Reference (Ground truth) columns and Predication (Classifier) rows in the confusion matrix follow the same convention. Therefore my results show:

  • True Negatives (TN) 30
  • True Positives (TP) 164
  • False Negatives (FN) 14
  • False Positives (FP) 60

Question: Why is sensitivity given as 0.3333 and specificity given as 0.9213? I would have thought it was the other way round - see below.

I am reluctant to believe that there is bug in the R confusionMatrix function as nothing has been reported and this seems to be a significant error.


Most references about calculating specificity and sensitivity define them as follows - i.e. www.medcalc.org/calc/diagnostic_test.php

  • Sensitivity = TP / (TP+FN) = 164/(164+14) = 0.9213
  • Specificity = TN / (FP+TN) = 30/(60+30) = 0.3333
jmuhlenkamp
  • 2,102
  • 1
  • 14
  • 37
wpqs
  • 657
  • 1
  • 7
  • 18

2 Answers2

11

According to the documentation ?confusionMatrix:

"If there are only two factor levels, the first level will be used as the "positive" result."

Hence in your example positive result will be 0, and evaluation metrics will be the wrong way around. To override default behaviour, you can set the argument positive = to the correct value, alas:

 confusionMatrix(logRegPrediction, valData[,"Seen"], positive = "1")
mtoto
  • 23,919
  • 4
  • 58
  • 71
  • Thanks - I got the same answer from package author Max Kukn. I would advise anyone using this function to explicitly give the positive argument to avoid this sort of problem. – wpqs Jan 03 '17 at 20:01
  • 1
    @mtoto thanks very much, I spent hours wondering about this problem – Diego Dec 08 '17 at 23:40
  • 1
    whoa, a seriously dangerously easy to overlook critical piece of knowledge - I think there's a lot of people just assuming it knows 1=positive, this certainly had me stumped – James Jul 10 '20 at 18:48
0

confusionMatrix( y_hat, y, positive = "1" )

will redefine all the metrics using "1" as the positive outcome. For example sensitivity and specificity will be reversed, but it will still display the confusion matrix as before, i.e. in the order of ( 0, 1). This can be rectified by reordering the factor levels of the classes as shown below.

y_hat = factor(y_hat, levels(y_hat)[ c(2,1) ])

y = factor(y, levels(y)[ c(2,1) ]

Now the matrix will be displayed in the order of (1, 0) with "1" as the positive outcome, and there is no need to use the positive="1" argument.

Convex1
  • 1
  • 1