0

My dataset has the following distribution:

class   frequency
0         960
1         2093
2         22696
3         1116
4         2541
5         1298
6         14

I am using python-imblearn to oversample the minority class. With regular smote I am able to generate 200 samples of class 6 but with l1borderline or l2borderline I am not able to do so.

from imblearn.over_sampling import SMOTE
sm=SMOTE(ratio={6:200})

# output
>>> Presampled shape Counter({2: 22696, 4: 2541, 1: 2093, 5: 1298, 3: 1116, 0: 960, 6: 14})
>>> resampled shape Counter({2: 22696, 4: 2541, 1: 2093, 5: 1298, 3: 1116, 0: 960, 6: 200})



sm=SMOTE(kind='borderline1',ratio={6:200})

# output
>>> Presampled shape Counter({2: 22696, 4: 2541, 1: 2093, 5: 1298, 3: 1116, 0: 960, 6: 14})
>>> resampled shape Counter({2: 22696, 4: 2541, 1: 2093, 5: 1298, 3: 1116, 0: 960, 6: 14})

sm=SMOTE(kind='borderline2',ratio={6:200})

# output
>>> Presampled shape Counter({2: 22696, 4: 2541, 1: 2093, 5: 1298, 3: 1116, 0: 960, 6: 14})
>>> resampled shape Counter({2: 22696, 4: 2541, 1: 2093, 5: 1298, 3: 1116, 0: 960, 6: 14})

Is there something mathematical or am I missing something?

sophros
  • 14,672
  • 11
  • 46
  • 75
Pratik Kumar
  • 2,211
  • 1
  • 17
  • 41
  • Maybe this can help: https://github.com/scikit-learn-contrib/imbalanced-learn/issues/36 – Vivek Kumar Jan 24 '18 at 07:32
  • @VivekKumar thanks for the link, seems that it is something mathematical. When I tried `sm=SMOTE(kind='borderline1',ratio={6:200,4:4000,0:2000})` samples for class-2 and class-4 were generated but not for class-6. May be the information is too less for the variants of SMOTE to generate samples. – Pratik Kumar Jan 24 '18 at 08:23

0 Answers0