14

While using Caret package for machine learning, I am struck with Caret's default "Positive" outcome picking i.e the first level of the outcome factor in binary classification problems.

Package says it can be set to the alternative level. Can any body help me to define the positive outcome?

Thanking you

phiver
  • 23,048
  • 14
  • 44
  • 56
duvvurum
  • 337
  • 2
  • 4
  • 9
  • 6
    in the confusion matrix you can set "positive" parameter to your class of choice: `confusionMatrix(data = preds, reference = actual, positive="YOUR_POSITIVE_CLASS")` – Zahra Apr 12 '17 at 17:41
  • The general solution to Caret's "positive problem" is, of course, to relevel the factor. That may cause pain of other kinds elsewhere, but it's simple to do: just feed the DV to `factor()` with the `levels` argument ordered e.g. `c("Positive", "Negative")` (if it's not already a factor, though, then that should be the `labels`, and `levels` should be `1:0` or whatever puts the current positive value first). – DHW Jul 19 '18 at 19:24

2 Answers2

28

look at this example. Extended this from the caret examples with confusionMatrix.

lvs <- c("normal", "abnormal")
truth <- factor(rep(lvs, times = c(86, 258)),
                levels = rev(lvs))
pred <- factor(
  c(
    rep(lvs, times = c(54, 32)),
    rep(lvs, times = c(27, 231))),               
  levels = rev(lvs))

xtab <- table(pred, truth)

str(truth)
Factor w/ 2 levels "abnormal","normal": 2 2 2 2 2 2 2 2 2 2 ...

Because abnormal is the first level, this will be the default positive class

confusionMatrix(xtab)

Confusion Matrix and Statistics

          truth
pred       abnormal normal
  abnormal      231     32
  normal         27     54

               Accuracy : 0.8285          
                 95% CI : (0.7844, 0.8668)
    No Information Rate : 0.75            
    P-Value [Acc > NIR] : 0.0003097       

                  Kappa : 0.5336          
 Mcnemar's Test P-Value : 0.6025370       

            Sensitivity : 0.8953          
            Specificity : 0.6279          
         Pos Pred Value : 0.8783          
         Neg Pred Value : 0.6667          
             Prevalence : 0.7500          
         Detection Rate : 0.6715          
   Detection Prevalence : 0.7645          
      Balanced Accuracy : 0.7616          

       'Positive' Class : abnormal     

To change to positive class = normal, just add this in the confusionMatrix. Notice the differences with the previous output, differences start appearing at the sensitivity and other calculations.

confusionMatrix(xtab, positive = "normal")

Confusion Matrix and Statistics

          truth
pred       abnormal normal
  abnormal      231     32
  normal         27     54

               Accuracy : 0.8285          
                 95% CI : (0.7844, 0.8668)
    No Information Rate : 0.75            
    P-Value [Acc > NIR] : 0.0003097       

                  Kappa : 0.5336          
 Mcnemar's Test P-Value : 0.6025370       

            Sensitivity : 0.6279          
            Specificity : 0.8953          
         Pos Pred Value : 0.6667          
         Neg Pred Value : 0.8783          
             Prevalence : 0.2500          
         Detection Rate : 0.1570          
   Detection Prevalence : 0.2355          
      Balanced Accuracy : 0.7616          

       'Positive' Class : normal 
phiver
  • 23,048
  • 14
  • 44
  • 56
3

Changing the positive Class:

One of the proficient way of doing this is through re-leveling of the target variable.

For example: In the breast cancer Wisconsin dataset, the default level of Diagnosis is the basis of default Positive Class. The reference level of Diagnosis is:

cancer<-read.csv("breast-cancer-wisconsin.csv")
cancer$Diagnosis<-as.factor(cancer$Diagnosis)
levels(cancer$Diagnosis)
[1] "Benign"    "Malignant"

After performing the test-train split and model fit.The resultant confusion matrix and performance measures are:

Confusion Matrix and Statistics

predicted        Actual
             Benign Malignant
Benign       115         7
Malignant      2        80
                                      
           Accuracy : 0.9559          
             95% CI : (0.9179, 0.9796)
No Information Rate : 0.5735          
P-Value [Acc > NIR] : <2e-16                          
              Kappa : 0.9091                 
 Mcnemar's Test P-Value : 0.1824                  
        Sensitivity : 0.9829          
        Specificity : 0.9195          
     Pos Pred Value : 0.9426          
     Neg Pred Value : 0.9756          
         Prevalence : 0.5735          
     Detection Rate : 0.5637          
Detection Prevalence: 0.5980  
Balanced Accuracy   : 0.9512
'Positive' Class    : Benign 

It is to note that the **Positive Class is Benign"

To change the Positive Class to "Malignant" can be done using the relevel() function. The relevel() changes the reference level of the variable.

cancer$Diagnosis <- relevel(cancer$Diagnosis, ref = "Malignant")
levels(cancer$Diagnosis)
[1] "Malignant" "Benign"

Again after performing the test-train split and model fitting, the confusion Matrix Performance Accuracy with changing of the reference is:

Confusion Matrix and Statistics

   predicted        Actual
               Malignant Benign
  Malignant        80      2
  Benign            7    115
                                      
           Accuracy : 0.9559          
             95% CI : (0.9179, 0.9796)
No Information Rate : 0.5735          
P-Value [Acc > NIR] : <2e-16                                   
          Kappa : 0.9091                               
 Mcnemar's Test P-Value : 0.1824                               
        Sensitivity : 0.9195          
        Specificity : 0.9829          
     Pos Pred Value : 0.9756          
     Neg Pred Value : 0.9426          
         Prevalence : 0.4265          
     Detection Rate : 0.3922          
Detection Prevalence : 0.4020          
Balanced Accuracy : 0.9512                                   
'Positive' Class : Malignant

Here the positive class is Malignant