I am dealing with a heavily imbalanced response variable, so my supervisor has recommended I use SMOTE in order to upsample the minority observations in my data set. The data consists of many categorical predictors and as I understand it themis::step_smote
from the tidymodels
ecosystem only accepts numerical features so far.
I am aware that I can convert my factors and strings to numerical dummies by using recipe::step_dummies
, but I am worried that the synthetic observations will create values for these dummies that do not make any logical sense (values between 0 and 1, where logically only 0 and 1 are possible).
Is this a legitimate concern at all or can I proceed with using SMOTE on categorical dummies?