0

Consider this simple example:

data_frame(truth = c(1,1,0,0),
           prediction = c(1,0,1,0),
           n_obs = c(100,10,90,50))
# A tibble: 4 x 3
  truth prediction n_obs
  <dbl>      <dbl> <dbl>
1     1          1   100
2     1          0    10
3     0          1    90
4     0          0    50

I would like to pass this tibble to caret::confusionMatrix so that I have all the metrics I need at once (accuracy, recall, etc).

As you can see, the tibble contains all the information required to compute performance statistics. For instance, you can see that in the test dataset (not available here), there are 100 observations where the predicted label 1 matched the true label 1. However, 90 observations with a predicted value of 1 were actually false positives.

I do not want to compute all the metrics by hand, and would like to resort to caret::confusionMatrix()

However, this has proven to be suprisingly difficult. Calling confusionMatrix(.) on the tibble above does not work. Is there any solution here?

Thanks!

jmuhlenkamp
  • 2,102
  • 1
  • 14
  • 37
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
  • What have you tried so far? Because you need other arguments to use `caret::confusionMatrix` – patL Jun 06 '18 at 15:50
  • your table is not usable. How can you tell how many predictions you have as 1 and as 0? See what happens when you use 'xtabs(n_obs ~ . , df)'. – phiver Jun 06 '18 at 15:51
  • guys... the table gives you how many observations fall in each category. For instance, there are 100 observations in the testing dataset where the `predicted` value `1` match the `label` value `1` – ℕʘʘḆḽḘ Jun 06 '18 at 15:56
  • In essence this table contains all the information we need to get confusionMatrix() working. but I was unable to do so. Perhaps by transforming into a `table` beforehand? (caret accepts that as an input). This problem arises because the table comes directly from Spark – ℕʘʘḆḽḘ Jun 06 '18 at 15:59
  • 1
    using your logic truth and prediction are exactly the same, also you have 100 and 90 predictions of 1. a.k.a what xtabs shows. Your model achieves 100% accuracy? – phiver Jun 06 '18 at 16:01
  • I apologize there is a typo in the `tibble` thanks for catching that – ℕʘʘḆḽḘ Jun 06 '18 at 16:03
  • @phiver question updated with more detail. thanks again! – ℕʘʘḆḽḘ Jun 06 '18 at 16:09
  • @phiver `xtabs(n_obs ~ . , df)`works like a charm. do you mind posting this as an answer. happy to accept then. – ℕʘʘḆḽḘ Jun 06 '18 at 16:14

1 Answers1

2

You could use the following. You have to set the positive class to 1 otherwise 0 will be taken as the positive class.

confusionMatrix(xtabs(n_obs ~ prediction + truth , df), positive = "1")

Confusion Matrix and Statistics

          truth
prediction   0   1
         0  50  10
         1  90 100

               Accuracy : 0.6             
                 95% CI : (0.5364, 0.6612)
    No Information Rate : 0.56            
    P-Value [Acc > NIR] : 0.1128          

                  Kappa : 0.247           
 Mcnemar's Test P-Value : 2.789e-15       

            Sensitivity : 0.9091          
            Specificity : 0.3571          
         Pos Pred Value : 0.5263          
         Neg Pred Value : 0.8333          
             Prevalence : 0.4400          
         Detection Rate : 0.4000          
   Detection Prevalence : 0.7600          
      Balanced Accuracy : 0.6331          

       'Positive' Class : 1    
phiver
  • 23,048
  • 14
  • 44
  • 56