-1

I am using different imblearn over-sampling methods on a data-set which contains ~55800 samples. About 200 are class 1, the rest class 0. I am oversampling class 1 with various over-sampling-strategies.

It does not improve my model quality and therefore I wan't to take a closer look at the generated samples. But how to access them? Any way to get the indices of the created ones?

Looping through the samples list before and after sampling, filtering out the non-duplicates, is way too demanding and freezes my laptop.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Andreas bleYel
  • 463
  • 2
  • 5
  • 7
  • Did some tests with smaller arrays. Made a 200 size arrays, resampled it with ROS and SMOTE with sampling-strategy 0.25. All the new samples in the resampled array were on the indexes from 200-224. Guess the new ones just get appended. – Andreas bleYel Apr 20 '20 at 10:27
  • Seems that it was possible in older versions, but it is now deprecated: [How to get sample indices from RandomUnderSampler in imblearn](https://stackoverflow.com/questions/60762538/how-to-get-sample-indices-from-randomundersampler-in-imblearn). – desertnaut May 16 '20 at 14:41

1 Answers1

0

There is no built in function in imblearn to return the indices for oversampling as far as I know. Therefore the only solution is to get the indices by comparison of before and after as you suggested. In order not to freeze your laptop, you can neglect most of the majority class samples, since they are not used to create the oversampled samples of the minority class (at least not for random oversampling or normal SMOTE).

So lets say you delete all except 500 samples of class 0 and keep all 200 samples of class 1 and then perform the smote-oversampling and then compare like you tried before. With this number of samples it shouldn't freeze your laptop and you can get an idea of how the oversampled samples look like.

ramobal
  • 241
  • 2
  • 9