I'm having some difficulty understanding when upsampling should be used when specifying the training dataset in tidymodels or otherwise.
For example, suppose you were building a classification model that would predict if baseball players got a hit (HIT) or not (NOHIT). If you had a dataset of 10,000 at-bats approximately 2700 - 3000 target variables would be HIT and the remainder would be NOHIT - that baseball.
This is an unbalanced dataset, however, the underlying system happens to be unbalanced. That being the case should up_sampling be used on the target variable of our classification model or would doing so produce erroneous results.