I am working on a regression model, with numerical features and target.
y : the weight of wastes collected in recycling bins
Xi : features about demography or urban elements around, or appearance of the bin
I noticed that my features that seems to have no impact on the target were also almost the same than the features that are unbalanced in the dataset.
i.e : "type of bin" -> 66 are buried vs 752 over the ground
*(nb : I used 0/1 for having numerical data)*
I would look whether these features have more impact when using over-sampling.
I first tried an artisanal way : duplicating data of the minority class.
i.e : I duplicated 5 times the 66 bins that are buried
For some features, the coefficients of linear regression were significantly higher, but neither feature importance of random forest.
I would precise my results by using SMOTE, in order to conclude whether these features have an impact on the target.
I found that we can use SMOTE for regression with smogn or resreg packages.
But I didn't found how to use it on features (not on the target : imbalancing is here for features).
Do you know a way to solve it ? (I mean : do you know if I can change a parameter of SMOTE , or use another function, to act on features and not on the target ?)