0

I am new in R and have created a classification model using tidymodels and below is the result of collect_predictions(model)

collect_predictions(members_final) %>% print()

# A tibble: 19,126 x 6
   id               .pred_died .pred_survived  .row .pred_class died    
   <chr>                 <dbl>          <dbl> <int> <fct>       <fct>   
 1 train/test split      0.285          0.715     5 survived    survived
 2 train/test split      0.269          0.731     6 survived    survived
 3 train/test split      0.298          0.702     7 survived    survived
 4 train/test split      0.276          0.724     8 survived    survived
 5 train/test split      0.251          0.749    10 survived    survived
 6 train/test split      0.124          0.876    18 survived    survived
 7 train/test split      0.127          0.873    21 survived    survived
 8 train/test split      0.171          0.829    26 survived    survived
 9 train/test split      0.158          0.842    30 survived    survived
10 train/test split      0.150          0.850    32 survived    survived
# … with 19,116 more rows

it works with yardstick functions:

collect_predictions(members_final) %>%
  conf_mat(died, .pred_class)

          Truth
Prediction  died survived
  died       196     7207
  survived    90    11633

But when I pipe collect_predictions to caret::confusionMatrix() then it doesn't work

collect_predictions(members_final) %>% 
  caret::confusionMatrix(as.factor(died), as.factor(.pred_class))

############## output #################
Error: `data` and `reference` should be factors with the same levels.
Traceback:

1. collect_predictions(members_final) %>% caret::confusionMatrix(as.factor(died), 
 .     as.factor(.pred_class))

2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))

3. eval(quote(`_fseq`(`_lhs`)), env, env)

4. eval(quote(`_fseq`(`_lhs`)), env, env)

I am not sure what's wrong here so how can I fix it to use caret evaluation ?

Purpose of using caret evaluation is to find out the positive/negative class.

Is there any other way to find out positive/neg classes (levels(df$class) is this correct to find out positive classes used in model ?)

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
ViSa
  • 1,563
  • 8
  • 30

1 Answers1

2

If you have predictions, like your output of collect_predictions(), then you don't want to pipe it into a function from caret. It doesn't take the data as the first argument, the way that the yardstick functions do. Instead, pass in the arguments as vectors:

library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
data("two_class_example", package = "yardstick")

confusionMatrix(two_class_example$predicted, two_class_example$truth)
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction Class1 Class2
#>     Class1    227     50
#>     Class2     31    192
#>                                           
#>                Accuracy : 0.838           
#>                  95% CI : (0.8027, 0.8692)
#>     No Information Rate : 0.516           
#>     P-Value [Acc > NIR] : <2e-16          
#>                                           
#>                   Kappa : 0.6749          
#>                                           
#>  Mcnemar's Test P-Value : 0.0455          
#>                                           
#>             Sensitivity : 0.8798          
#>             Specificity : 0.7934          
#>          Pos Pred Value : 0.8195          
#>          Neg Pred Value : 0.8610          
#>              Prevalence : 0.5160          
#>          Detection Rate : 0.4540          
#>    Detection Prevalence : 0.5540          
#>       Balanced Accuracy : 0.8366          
#>                                           
#>        'Positive' Class : Class1          
#> 

Created on 2020-10-21 by the reprex package (v0.3.0.9001)

Looks like your variable names will be died and .pred_class; you'll need to save the dataframe containing predictions as an object to access this.

Julia Silge
  • 10,848
  • 2
  • 40
  • 48
  • thanks for helping me, I even tried `collect_predictions(members_final) %>% caret::confusionMatrix(.$died, .$.pred_class)`before posting question here but it failed as you rightly said CM doesn't accept data as first argument. And I was trying to practice` tidymodels` using `Tidytuesday Himalayan` data from your video & blog and had a doubt over the `positive class` which I think I will ask you in the video comments. Thanks again for helping me and posting great content for everyone's learning!! – ViSa Oct 22 '20 at 08:45