0

I am working on a regression model, with numerical features and target.

y : the weight of wastes collected in recycling bins
Xi : features about demography or urban elements around, or appearance of the bin

I noticed that my features that seems to have no impact on the target were also almost the same than the features that are unbalanced in the dataset.

i.e : "type of bin" -> 66 are buried vs 752 over the ground
*(nb : I used 0/1 for having numerical data)*

I would look whether these features have more impact when using over-sampling.

I first tried an artisanal way : duplicating data of the minority class.

i.e : I duplicated 5 times the 66 bins that are buried

For some features, the coefficients of linear regression were significantly higher, but neither feature importance of random forest.

I would precise my results by using SMOTE, in order to conclude whether these features have an impact on the target.

I found that we can use SMOTE for regression with smogn or resreg packages.

But I didn't found how to use it on features (not on the target : imbalancing is here for features).

Do you know a way to solve it ? (I mean : do you know if I can change a parameter of SMOTE , or use another function, to act on features and not on the target ?)

Elise1369
  • 259
  • 1
  • 6
  • 19

0 Answers0