Zero-R model calculation of Sensitivity and Specificity using Confusion Matrix and Statistics with Caret

Question

Here are my results from the confusionMatrix() function in R, this is based on a Zero-R model. I may have setup the function incorrectly, according to its results there's a mismatch between what I manually got as the answer varied by randomized seeds and the confusionMatrix() function's answer of sensitivity just being 1.0000:

> sensitivity1 = 213/(213+128)
> sensitivity2 = 211/(211+130)
> sensitivity3 = 215/(215+126)
> #specificity = 0/(0+0) there were no other predictions
> specificity = 0
> specificity
[1] 0
> sensitivity1
[1] 0.6246334
> sensitivity2
[1] 0.6187683
> sensitivity3
[1] 0.6304985

There is a warning message but it does look like it still runs and refactors the data to match because it wasn't in the same order, this may be based on train and test ordering and randomization. I attempted to go back and make sure the train and test didn't have reverse ordering with the negative sign, or different numbers of rows. Here's the results from caret's confusionMatrix() function:

> confusionMatrix(as.factor(testDiagnosisPred), as.factor(testDiagnosis), positive="B") 
Confusion Matrix and Statistics

          Reference
Prediction   B   M
         B 211 130
         M   0   0
                                          
               Accuracy : 0.6188          
                 95% CI : (0.5649, 0.6706)
    No Information Rate : 0.6188          
    P-Value [Acc > NIR] : 0.524           
                                          
                  Kappa : 0               
                                          
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 1.0000          
            Specificity : 0.0000          
         Pos Pred Value : 0.6188          
         Neg Pred Value :    NaN          
             Prevalence : 0.6188          
         Detection Rate : 0.6188          
   Detection Prevalence : 1.0000          
      Balanced Accuracy : 0.5000          
                                          
       'Positive' Class : B               
                                          
Warning message:
In confusionMatrix.default(as.factor(testDiagnosisPred), as.factor(testDiagnosis),  :
  Levels are not in the same order for reference and data. Refactoring data to match.

The testDiagnosisPred just shows that it guesses Benign (B) as the diagnosis for every cancer test in the data set, these vary based on seed because actual Benign (B) and Malignant (M) results get randomized each time.

testDiagnosisPred
  B 
341 
> ## testDiagnosisPred
> ##   B 
> ## 228
> 
> majorityClass # confusion matrix

  B   M 
211 130 
> ## 
> ##   B   M 
> ## 213 128
> 
> # another seed's confusion matrix
> ## B   M 
> ## 211 130

Here's what some of the data looks like using the head() and str() functions:

> head(testDiagnosisPred)
[1] "B" "B" "B" "B" "B" "B"
> head(cancerdata.train$Diagnosis)
[1] "B" "B" "M" "M" "M" "B"
> head(testDiagnosis)
[1] "B" "B" "M" "M" "M" "B"
> 
> str(testDiagnosisPred)
 chr [1:341] "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" "B" ...
> str(cancerdata.train$Diagnosis)
 chr [1:341] "B" "B" "M" "M" "M" "B" "B" "B" "M" "M" "M" "B" "M" "M" "B" "M" "B" "B" "B" "M" "B" "B" "B" "B" ...
> str(testDiagnosis)
 chr [1:341] "B" "B" "M" "M" "M" "B" "B" "B" "M" "M" "M" "B" "M" "M" "B" "M" "B" "B" "B" "M" "B" "B" "B" "B" ...
>

@akash87 I have no idea if a single line in my code has anything to do with a zero-R classification model or what that looks like, I've been researching for days and asking my professor with no luck — cocoakrispies93, Sep 17 '21 at 19:26
@akash87 I've already checked that one out, I'm not sure what these functions are from or how to set them up: library(OneR) ZeroR <- function(x, ...) { output <- OneR(cbind(dummy = TRUE, x[ncol(x)]), ...) class(output) <- c("ZeroR", "OneR") output } predict.ZeroR <- function(object, newdata, ...) { class(object) <- "OneR" predict(object, cbind(dummy = TRUE, newdata[ncol(newdata)]), ...) } — cocoakrispies93, Sep 17 '21 at 21:33
@akash87 here's what my professor said: #assign majority class to samples #divide into test and training 60 and 40 #60% what's majority #prediction class assign majority class in training to all test' — cocoakrispies93, Sep 17 '21 at 21:35
Turns out the textbook didn't have R code or the lectures, it was all cleared up, sorry! — cocoakrispies93, Sep 18 '21 at 22:56
@akash87 I now have much more clarification and a more specific question for the zero R model's calculations — cocoakrispies93, Sep 19 '21 at 21:31
Need clarification on terminology and references for methods. Question uses terminology that is not widely understood. The later question was then answered by the questioner with no further clarification. Voting to close for missing data and obscure details on methods. — IRTFM, Sep 20 '21 at 00:21
@IRTFM Oh hey it's you again, I'm going to go ahead and flag you for harassment. This one again says the package is caret, the function is confusionMatrix(), and the manual results differ from the function's results. — cocoakrispies93, Sep 20 '21 at 23:11
I don't think it's "harassment" to point out that you have asked a question about results that cannot possibly be reproduced because there is no [MCVE]. I'm not the only person to vote to close this question. It's basically asking for speculation about a bunch of results on a minimally described dataset using methods that are not offered at all in code and are very sketchily described and only present in comments. So learn to [edit] instead of comment. I think you should review the [ask] material. Consider CrossValidated.com and https://datascience.stackexchange.com/ — IRTFM, Sep 20 '21 at 23:37

score 0 · Accepted Answer · answered Sep 21 '21 at 02:33

The confusion with the confusion matrix and the calculations of specificity and sensitivity occurred because of misreading the confusion matrix horizontally instead of vertically, the correct answer comes from the confusionMatrix() function in caret, another way of knowing this is that it's a ZeroR model and upon further investigation it's just always 1.00 sensitivity and 0.00 specificity! That's because the ZeroR model uses zero rules and zero attributes, just gives a majority prediction.

> confusionMatrix(as.factor(testDiagnosisPred), as.factor(testDiagnosis), positive="B") 
Confusion Matrix and Statistics

          Reference
Prediction   B   M
         B 211 130
         M   0   0
                                          
               Accuracy : 0.6188                  
                                          
            Sensitivity : 1.0000          
            Specificity : 0.0000

When I did these manual specificity and sensitivity calculations I misread the confusion matrix horizontally instead of vertically:

> sensitivity1 = 213/(213+128)
> sensitivity2 = 211/(211+130)
> sensitivity3 = 215/(215+126)
> #specificity = 0/(0+0) there were no other predictions
> specificity = 0
> specificity
[1] 0
> sensitivity1
[1] 0.6246334
> sensitivity2
[1] 0.6187683
> sensitivity3
[1] 0.6304985

Zero-R model calculation of Sensitivity and Specificity using Confusion Matrix and Statistics with Caret

1 Answers1

Linked