0

This has been bothering me for quiet some time.

If oversampling and undersampling both have their pros and cons, why not use them together to minimize their weaknesses?

I just couldn't find a paper or an article that says they've used both or we shouldn't use both simultaneously.

Joint usage would allow less oversampling the minority and less undersampling the majority, wouldn't it?

ayhan
  • 70,170
  • 20
  • 182
  • 203
user8397275
  • 131
  • 1
  • 8
  • 1
    "Two wrongs don't make a right." – Scott Hunter Jan 29 '18 at 01:24
  • @Scott But they aren't "wrong" all the time. If they were always wrong, we shouldn't be using the at all. Sure, they inevitably cause information loss and often other issues. But we still use either of them to deal with imbalanced data. Why can't use both at the same time? – user8397275 Jan 29 '18 at 01:47
  • 1
    Read about SMOTE - it is the hybrid method you are looking for: http://www.chioka.in/class-imbalance-problem/ – avchauzov Jan 29 '18 at 08:50
  • @avchauzov Thank you! I'm glad the article states that we can use both techniques yet will also bring drawbacks from both. But, I've used SMOTE before and never suspected it to implement undersampling. (Perhaps because I've only applied the construction part.) The name only states oversampling after all. Guess I should look into SMOTE deeper. Again, thanks a lot! – user8397275 Jan 30 '18 at 02:55

0 Answers0