I have a machine learning model and a dataset with 15 features about breast cancer. I want to predict the status of a person (alive or dead). I have 85% alive cases and only 15% dead. So, I want to use over-sampling for dealing with this problem and combine it with stratified k fold. I write this code, it seems to work well, but I don t know if I put them in the right order:
skf = StratifiedKFold(n_splits=10, random_state=None)
skf.get_n_splits(x, y)
ros = RandomOverSampler(sampling_strategy="not majority")
x_res, y_res = ros.fit_resample(x, y)
for train_index, test_index in skf.split(x_res,y_res):
x_train,x_test=x_res.iloc[train_index],x_res.iloc[test_index]
y_train,y_test=y_res.iloc[train_index],y_res.iloc[test_index]
Is it correct in this way? Or should I apply oversampling before stratified k fold?