Questions tagged [imblearn]

Python Imbalanced learning package. To improve results or speed of learning process in Machine Learning algorithms on datasets where one or more of the classes has significantly less / more training examples you can use imbalanced learning approach. Imbalanced learning methods use re-sampling techniques like SMOTE, ADASYN, Tomek links, and their various combinations.

imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance. It is compatible with scikit-learn and is part of scikit-learn-contrib projects.

Most classification algorithms will only perform optimally when the number of samples of each class is roughly the same. Highly skewed datasets, where the minority is heavily outnumbered by one or more classes, have proven to be a challenge while at the same time becoming more and more common.

One way of addressing this issue is by re-sampling the dataset as to offset this imbalance with the hope of arriving at a more robust and fair decision boundary than you would otherwise.

Re-sampling techniques are divided in two categories:

    Under-sampling the majority class(es).
    Over-sampling the minority class.
    Combining over- and under-sampling.
    Create ensemble balanced sets.

Below is a list of the methods currently implemented in this module.

Under-sampling

Random majority under-sampling with replacement
Extraction of majority-minority Tomek links
Under-sampling with Cluster Centroids
NearMiss-(1 & 2 & 3)
Condensed Nearest Neighbour
One-Sided Selection
Neighboorhood Cleaning Rule
Edited Nearest Neighbours
Instance Hardness Threshold
1. Repeated Edited Nearest Neighbours
2. AllKNN

Over-sampling 12. Random minority over-sampling with replacement 13. SMOTE - Synthetic Minority Over-sampling Technique 14. bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2 15. SVM SMOTE - Support Vectors SMOTE 16. ADASYN - Adaptive synthetic sampling approach for imbalanced learning

Over-sampling followed by under-sampling
- SMOTE + Tomek links
- SMOTE + ENN
Ensemble classifier using samplers internally
- EasyEnsemble
- BalanceCascade
- Balanced Random Forest
- Balanced Bagging

Resources:

205 questions

votes

1 answer

python imblearn make_pipeline TypeError: Last step of Pipeline should implement fit

I am trying to implement SMOTE of imblearn inside the Pipeline. My data sets are text data stored in pandas dataframe. Please see below the code snippet text_clf =Pipeline([('vect', TfidfVectorizer()),('scale',…

python scikit-learn imblearn

asked Nov 02 '18 at 07:55

pythondumb

1,187
1
15
30

votes

1 answer

RandomUnderSampler' object has no attribute 'fit_resample'

I am using RandomUnderSampler from imblearn, but I get the following error. Any ideas? Thanks from imblearn.under_sampling import RandomUnderSampler print('Initial dataset shape %s' % Counter(y.values.squeeze())) rus =…

python jupyter-notebook jupyter imblearn

asked Aug 12 '19 at 18:43

hsbr13

votes

2 answers

SMOTE with missing values

I am trying to use SMOTE from imblearn package in Python, but my data has a lot of missing values and I got the following error: ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). I checked the parameters here, and…

python scikit-learn imblearn

asked Jul 13 '18 at 09:58

MJeremy

1,102
17
27

votes

1 answer

TypeError: init() got an unexpected keyword argument 'ratio' when using SMOTE

I am using SMOTE to oversample as my dataset is imbalanced. I am getting an unexpected argument error. But in the documentation, the ratio argument is defined for SMOTE. Can someone help me understand where I am going wrong? Code snippet from…

oversampling imblearn smote

asked Jun 06 '20 at 00:11

anushiya-thevapalan

votes

1 answer

How to implement RandomUnderSampler in a scikit learn pipline?

I have a scikit learn pipeline to scale numeric features and encode categorical features. It was working fine until I tried to implement the RandomUnderSampler from imblearn. My goal is to implement the undersampler step since my dataset is very…

python scikit-learn imblearn

asked Jul 25 '19 at 02:19

Ale M.

votes

3 answers

When do feature selection in imblearn pipeline with cross-validation and grid search

Currently I am building a classifier with heavily imbalanced data. I am using the imblearn pipeline to first to StandardScaling, SMOTE, and then the classification with gridSearchCV. This ensures that the upsampling is done during the…

python sampling feature-selection imblearn

asked Jul 01 '19 at 19:10

Joost Jansen

votes

1 answer

How to oversample image dataset using Python?

I am working on a multiclass classification problem with an unbalanced dataset of images(different class). I tried imblearn library, but it is not working on the image dataset. I have a dataset of images belonging to 3 class namely A,B,C. A has 1000…

python-3.x machine-learning deep-learning computer-vision imblearn

asked Jan 30 '18 at 23:20

ReInvent_IO

votes

3 answers

Feature Importance using Imbalanced-learn library

The imblearn library is a library used for unbalanced classifications. It allows you to use scikit-learn estimators while balancing the classes using a variety of methods, from undersampling to oversampling to ensembles. My question is however, how…

python scikit-learn classification random-forest imblearn

asked Sep 18 '17 at 16:25

mamafoku

1,049
2
14
28

votes

2 answers

How can I generate categorical synthetic samples with imblearn and SMOTE?

I am looking to generate synthetic samples for a machine learning algorithm using imblearn's SMOTE. I have a few categorical features which I have converted to integers using sklearn preprocessing.LabelEncoder. The problem that I have is that when…

python python-3.x scikit-learn imblearn

asked Nov 14 '16 at 01:18

S Hoult

votes

4 answers

How to resolve "cannot import name '_MissingValues' from 'sklearn.utils._param_validation'" issue when trying to import imblearn?

I am trying to import imblearn into my python notebook after installing the required modules. However, I am getting the following error: Additional info: I am using a virtual environment in Visual Studio Code. I've made sure that venv was selected…

python python-3.x scikit-learn imblearn

asked Jul 01 '23 at 08:52

user22158562

votes

1 answer

Why does SMOTE not work with more than 15 features / What method does work with more than 15 features?

I'm currently implementing machine learning using SMOTE from imblearn.over_sampling, and as I'm synthesizing data for it, I see a very noticeable cutoff for when the SMOTE method breaks. When I synthesize data using the following code and run it…

python machine-learning scikit-learn smote imblearn

asked Jun 13 '22 at 18:56

Brandon Bonifacio

votes

7 answers

Cannot import name 'available_if' from 'sklearn.utils.metaestimators'

While importing "from imblearn.over_sampling import SMOTE", getting import error. Please check and help. I tried upgrading sklearn, but the upgrade was undone with 'OSError'. Firsty installed imbalance-learn through pip. !pip install -U…

python jupyter-notebook imbalanced-data imblearn smote

asked Oct 17 '21 at 07:05

Piyush

votes

1 answer

The difference between smote.fit_sample() and smote.fit_resample()

In imblearn, what is the difference between smote.fit_sample() and smote.fit_resample(), and when should we use one over the other?

python python-3.x data-science imblearn

asked Jan 29 '21 at 17:12

Manish KC

votes

1 answer

'RandomOverSampler' object has no attribute '_validate_data'

Hi I am getting following error can anyone suggest me what could be wrong? When I am calling, os.fit_sample(X,y) 'RandomOverSampler' object has no attribute '_validate_data'

python imblearn

asked Jun 14 '20 at 16:43

Vikas Singh

votes

0 answers

How to save model after sklearn gridsearchcv using imblearn pipeline : TypeError: can't pickle _thread.RLock objects

The problem i am facing is this that I have performed grid search using imblearn pipeline and using sklearn gridsearchcv as I was dealing with an extremely unbalanced dataset, but when I try to save the model , I am getting the error 'TypeError:…

python tensorflow2.0 tf.keras gridsearchcv imblearn

asked Jan 19 '20 at 10:44

Surajit Chakraborty

Prev 1

…

13 14 Next