How to apply Undersampling or oversampling to a dataset in Python?

Asked Jun 27 '20 at 13:02

Active Jun 28 '20 at 13:46

Viewed 1,548 times

Here's the thing, I have an imbalanced data and I'm trying to use Undersampling.

Perhaps people don't have the solution to my error, but if this is the case, any alternative would be appreciated.

This is what I've done:

from imblearn.under_sampling import RandomUnderSampler
rus = RandomUnderSampler(random_state=0)
X_train_resampled, y_train_resampled = rus.fit_sample(X_train, y_train)

However, I keep getting the error:

AttributeError: 'RandomUnderSampler' object has no attribute '_validate_data'

I saw this post RandomUnderSampler' object has no attribute 'fit_resample', but the answer didn't work. I upgraded the library, it didn't work. I also tried using fit_resample and I got the exact same error.

Any ideas on how to fix this error OR other way of applying Undersampling?

UPDATE: The whole error below (can't show the real data, privacy concerns)

Regarding the version: my Python is 3.7 and scikit-learn 0.23.1

edited Jun 28 '20 at 13:46

asked Jun 27 '20 at 13:02

Dumb ML

You're not posting the entirety of your code, since the error message clearly originated from somewhere outside the snippet you've posted. Also post the error message in more detail. – michalwa Jun 27 '20 at 13:30
@michalwa can't show the data, but I updated with the print. Also, there isn't anything that different in the data, just a bunch of dummies – Dumb ML Jun 27 '20 at 15:15
Can you try to under-sample a synthetic dataset such as - x = np.random.rand(m,n) ? Also, check compatibility between imblearn version and python version? – a_jelly_fish Jun 27 '20 at 15:30
@achow good idea, just did that. – Dumb ML Jun 28 '20 at 13:46
Can you please try pip install imblearn -u ? I'm unsure of your current version. – a_jelly_fish Jun 28 '20 at 16:29

How to apply Undersampling or oversampling to a dataset in Python?

0 Answers0