Using SMOTE with imblearn pipline and crossvalidation

Asked Mar 26 '23 at 21:10

Active Mar 26 '23 at 21:10

Viewed 24 times

this more of a theoretical questions, but i am dealing with a pretty imbalanced dataset. Therefore I want to use SMOTE to rebalance the data in order to achieve better results with my models. Now I read that to avoid Data Leakage only the training set should be rebalanced (of course after the split). But now if I use cross validation I think that during the cross validation process also at each iteration only the folds for training should be rebalanced and the fold for validation should not be touched to avoid data leakage during cross validation. This will also get important for me when using Hyperopt to find the best working Hyperparameters for each model.

I read the documentation of imblearn, but I cant find any information, whether it is also applied in cross-validation (the process of rebalancing of each training fold but not the validation fold)

https://imbalanced-learn.org/stable/common_pitfalls.html

Can someone help ? I tried to find more information just to make sure this process runs correctly

asked Mar 26 '23 at 21:10

wihee

Using SMOTE with imblearn pipline and crossvalidation

0 Answers0