SMOTE for regression on unbalanced features

Question

I am working on a regression model, with numerical features and target.

y : the weight of wastes collected in recycling bins
Xi : features about demography or urban elements around, or appearance of the bin

I noticed that my features that seems to have no impact on the target were also almost the same than the features that are unbalanced in the dataset.

i.e : "type of bin" -> 66 are buried vs 752 over the ground
*(nb : I used 0/1 for having numerical data)*

I would look whether these features have more impact when using over-sampling.

I first tried an artisanal way : duplicating data of the minority class.

i.e : I duplicated 5 times the 66 bins that are buried

For some features, the coefficients of linear regression were significantly higher, but neither feature importance of random forest.

I would precise my results by using SMOTE, in order to conclude whether these features have an impact on the target.

I found that we can use SMOTE for regression with smogn or resreg packages.

But I didn't found how to use it on features (not on the target : imbalancing is here for features).

Do you know a way to solve it ? (I mean : do you know if I can change a parameter of SMOTE , or use another function, to act on features and not on the target ?)

What have you tried, and what went wrong with your attempt? Please [edit] your question to provide a [mcve] with sample input and expected output, so that we can understand how to help — G. Anderson, Feb 02 '21 at 18:37
Ok sorry, I will try to precise even if my issue is that I don't find what to try — Elise1369, Feb 02 '21 at 18:41

SMOTE for regression on unbalanced features

0 Answers0