Questions tagged [imblearn]

Python Imbalanced learning package. To improve results or speed of learning process in Machine Learning algorithms on datasets where one or more of the classes has significantly less / more training examples you can use imbalanced learning approach. Imbalanced learning methods use re-sampling techniques like SMOTE, ADASYN, Tomek links, and their various combinations.

imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance. It is compatible with scikit-learn and is part of scikit-learn-contrib projects.

Most classification algorithms will only perform optimally when the number of samples of each class is roughly the same. Highly skewed datasets, where the minority is heavily outnumbered by one or more classes, have proven to be a challenge while at the same time becoming more and more common.

One way of addressing this issue is by re-sampling the dataset as to offset this imbalance with the hope of arriving at a more robust and fair decision boundary than you would otherwise.

Re-sampling techniques are divided in two categories:

    Under-sampling the majority class(es).
    Over-sampling the minority class.
    Combining over- and under-sampling.
    Create ensemble balanced sets.

Below is a list of the methods currently implemented in this module.

Under-sampling

Random majority under-sampling with replacement
Extraction of majority-minority Tomek links
Under-sampling with Cluster Centroids
NearMiss-(1 & 2 & 3)
Condensed Nearest Neighbour
One-Sided Selection
Neighboorhood Cleaning Rule
Edited Nearest Neighbours
Instance Hardness Threshold
1. Repeated Edited Nearest Neighbours
2. AllKNN

Over-sampling 12. Random minority over-sampling with replacement 13. SMOTE - Synthetic Minority Over-sampling Technique 14. bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2 15. SVM SMOTE - Support Vectors SMOTE 16. ADASYN - Adaptive synthetic sampling approach for imbalanced learning

Over-sampling followed by under-sampling
- SMOTE + Tomek links
- SMOTE + ENN
Ensemble classifier using samplers internally
- EasyEnsemble
- BalanceCascade
- Balanced Random Forest
- Balanced Bagging

Resources:

205 questions

votes

1 answer

use imblearn to plot ROC curve

I'm trying to use imblearn to plot a ROC curve but run into some problem. here's a screenshot of my data from imblearn.over_sampling import SMOTE, ADASYN from collections import Counter import pandas as pd import numpy as np import…

asked Jul 18 '18 at 14:26

yihao ren

votes

1 answer

Using VotingClassifier with other classifiers inside a Sklearn Pipeline

I want to use the VotingClassifier inside a sklearn Pipeline, where I defined a set of classifiers .. I got some intuition from this question: Using VotingClassifier in Sklearn Pipeline to build the code below, but in this question each of the…

python machine-learning scikit-learn imblearn

asked May 04 '18 at 19:05

Minions

5,104
5
50
91

votes

2 answers

How to use Random Undersampler with ratio = 'dict' in imblearn?

I am trying to deal with imbalanced data set using imblearn's random under-sampler. I want to specify the number of labels to be under-sampled manually. Here is my code: sm = RandomUnderSampler(ratio = {0:142498, 1: 495}, random_state=42) X_train,…

python python-3.x syntax syntax-error imblearn

asked Jun 23 '17 at 15:18

Saurav--

1,530
2
15
33

votes

0 answers

Undersampling for multilabel imbalanced datasets in pandas

I'm working on a roll-your-own undersampling function, since imblearn does not work neatly with multi-label classification (e.g. it only accepts one dimensional y). I want to iterate through X and y, removing a row every 2 or 3 rows that are part…

pandas data-science imblearn

asked May 31 '17 at 19:51

tw0000

vote

0 answers

ImageDataGenerator.flow_from_dataframe still has problems with Overfitting

I have an image dataset of 2432 images, each with a category of a total of 3. The labels are stored in a csv file with the image id and the label (T1). The distribution of data is: negative 1695 positive 648 neutral 89 I'm trying to…

python tensorflow data-science imbalanced-data imblearn

asked Jun 22 '23 at 23:48

Javier Romero.

vote

1 answer

Mitigation for imblearn pipelines

I'm trying to mitigate unfairness for a model I trained using an imblearn pipeline with ADASYN. My pipeline looks like this: loaded_model = Pipeline(steps=[('feature_scaler', StandardScaler()), ('adasyn_resampling',…

python data-science imblearn fairlearn

asked Apr 17 '23 at 15:58

Ana Rodrigues

vote

1 answer

Imblearn pipeline with SMOTE step - AttributeError: This 'Pipeline' has no attribute 'transform'

As part of an assignment, I have been trying to wipe up a pipeline to preprocess some data that I have. Said data has a total of five classes, one of which is imbalanced compared to the others, and therefore I decided to apply SMOTE for it. The code…

machine-learning scikit-learn imbalanced-data imblearn

asked Mar 29 '23 at 20:01

Matheus de Oliveira

vote

0 answers

Running sampling in scikit-learn with imblearn in parallel

I just noticed that the over-/undersampler methods from the imbalanced-learn (imblearn) package now give a future deprecation warning for running in parallel / n_jobs=x argument FutureWarning: The parameter n_jobs has been deprecated in 0.10 and…

python performance imblearn

asked Mar 15 '23 at 14:18

Björn

1,610
2
17
37

vote

1 answer

why does installing imblearn with pip is failing?

I am trying to install the python package "imblearn" to balanace datasets, with the command pip install imblearn. but it keeps failing. trying from cmdand from PowerShell with admin privileges, with regular pip command, and with git clone to the…

imblearn

asked Jan 01 '23 at 20:45

Ron Keinan

vote

2 answers

StratifiedKFold and Over-Sampling together

I have a machine learning model and a dataset with 15 features about breast cancer. I want to predict the status of a person (alive or dead). I have 85% alive cases and only 15% dead. So, I want to use over-sampling for dealing with this problem and…

python machine-learning scikit-learn imbalanced-data imblearn

asked Dec 31 '22 at 09:42

Andreas

vote

1 answer

Is there a parameter for GridSearchCV to select the best with the lowest difference between train and test set?

My goal is to get good fit model (train and test set metrics differences are only 1% - 5%). This is because the Random Forest tends to overfit (the default params train set f1 score for class 1 is 1.0) The problem is, the GridSearchCV only consider…

python scikit-learn cross-validation grid-search imblearn

asked Nov 15 '22 at 02:04

Jason Rich Darmawan

1,607
3
14
31

vote

0 answers

SHAP with an imblearn pipeline

How can I use SHAP after using imblearn pipeline? This is my code: pipeline_adaboost = Pipeline([('smt', SMOTE(random_state=42)), ('adaboost', AdaBoostClassifier(random_state=42))]) adaboost_parameters =…

python shap imblearn

asked Oct 18 '22 at 04:13

new_data

vote

1 answer

.fit : AttributeError in python3, using imblearn.ensemble and BalancedRandomForestClassifier

CODE: from imblearn.ensemble import BalancedRandomForestClassifier bal_forest = BalancedRandomForestClassifier(n_estimators=100, random_state=1) bal_forest.fit(X_train,…

python-3.x machine-learning random-forest imblearn

asked Jul 31 '22 at 21:44

Courtney Knittel

vote

0 answers

Outlier elimination in a imblearn pipeline affecting both X and y

I aim to integrate outlier elimination into a machine learning pipeline with a continuous dependent variable. The challenge is to keep X and y at the same length, thus I have eliminate outliers in both datasets. As this task turned out to be…

scikit-learn pipeline imblearn

asked Jun 23 '22 at 05:13

Rob Zanelli

vote

1 answer

Performing Random Under-sampling after SMOTE using imblearn

I am trying to implement combining over-sampling and under-sampling using RandomUnderSampler() and SMOTE(). I am working on the loan_status dataset. I have done the following split. X = df.drop(['Loan_Status'],axis=1).values # independant…

python machine-learning classification imblearn smote

asked Apr 08 '22 at 10:41

Meghna Chakrabarti

Prev 1 2 3

…

13 14 Next