I am dealing with a prediction case where the data is suffering from a strong imbalance in the binary prediction target. Is there a way of penalizing wrong predictions of the minority class with a cost matrix in TidyModels? I know that caret had this implemented, but the information I find in TidyModels is quite confusing.
All I find is the baguette::class_cost()
function from the experimental baguette package, which only seems to apply to bagged trees models.
Asked
Active
Viewed 167 times
1

O René
- 305
- 1
- 12
-
Maybe this [question](https://stackoverflow.com/questions/66759453/tidymodels-classify-as-true-only-if-the-probability-is-75-or-higher) or better the [probably package](https://probably.tidymodels.org/) can help you to post-process your model results. – Mischa Oct 27 '21 at 13:49
1 Answers
1
Yes, you want to set a classification_cost()
:
library(yardstick)
#> For binary classification, the first factor level is assumed to be the event.
#> Use the argument `event_level = "second"` to alter this as needed.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Two class example
data(two_class_example)
# Assuming `Class1` is our "event", this penalizes false positives heavily
costs1 <- tribble(
~truth, ~estimate, ~cost,
"Class1", "Class2", 1,
"Class2", "Class1", 2
)
# Assuming `Class1` is our "event", this penalizes false negatives heavily
costs2 <- tribble(
~truth, ~estimate, ~cost,
"Class1", "Class2", 2,
"Class2", "Class1", 1
)
classification_cost(two_class_example, truth, Class1, costs = costs1)
#> # A tibble: 1 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 classification_cost binary 0.288
classification_cost(two_class_example, truth, Class1, costs = costs2)
#> # A tibble: 1 × 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 classification_cost binary 0.260
Created on 2021-10-27 by the reprex package (v2.0.1)
In tidymodels, you can use this metric either just to compute results after the fact or in tuning. Learn more here.

Julia Silge
- 10,848
- 2
- 40
- 48
-
Thank you for showing me this, I somehow did not find this on my own. If I may pose a follow-up question: How would I go about using this metric for tuning? simply use the `classification_cost` function with its cost matrix for the `metrics` argument in `tune_grid()`? So far I have not used functions with arguments for the `metrics`-argument. – O René Oct 27 '21 at 16:38
-
You create a `metric_set()` and then use that in a tuning function. I have a couple of blog posts that demonstrate this, [such as this one](https://juliasilge.com/blog/baseball-racing/). – Julia Silge Oct 27 '21 at 18:53