Questions tagged [imblearn]

Python Imbalanced learning package. To improve results or speed of learning process in Machine Learning algorithms on datasets where one or more of the classes has significantly less / more training examples you can use imbalanced learning approach. Imbalanced learning methods use re-sampling techniques like SMOTE, ADASYN, Tomek links, and their various combinations.

imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance. It is compatible with scikit-learn and is part of scikit-learn-contrib projects.

Most classification algorithms will only perform optimally when the number of samples of each class is roughly the same. Highly skewed datasets, where the minority is heavily outnumbered by one or more classes, have proven to be a challenge while at the same time becoming more and more common.

One way of addressing this issue is by re-sampling the dataset as to offset this imbalance with the hope of arriving at a more robust and fair decision boundary than you would otherwise.

Re-sampling techniques are divided in two categories:

    Under-sampling the majority class(es).
    Over-sampling the minority class.
    Combining over- and under-sampling.
    Create ensemble balanced sets.

Below is a list of the methods currently implemented in this module.

Under-sampling

  1. Random majority under-sampling with replacement
  2. Extraction of majority-minority Tomek links
  3. Under-sampling with Cluster Centroids
  4. NearMiss-(1 & 2 & 3)
  5. Condensed Nearest Neighbour
  6. One-Sided Selection
  7. Neighboorhood Cleaning Rule
  8. Edited Nearest Neighbours
  9. Instance Hardness Threshold
    1. Repeated Edited Nearest Neighbours
    2. AllKNN

Over-sampling 12. Random minority over-sampling with replacement 13. SMOTE - Synthetic Minority Over-sampling Technique 14. bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2 15. SVM SMOTE - Support Vectors SMOTE 16. ADASYN - Adaptive synthetic sampling approach for imbalanced learning

  1. Over-sampling followed by under-sampling

    • SMOTE + Tomek links
    • SMOTE + ENN
  2. Ensemble classifier using samplers internally

    • EasyEnsemble
    • BalanceCascade
    • Balanced Random Forest
    • Balanced Bagging

Resources:

205 questions
0
votes
1 answer

Using SMOTE to oversample a binary class; why is it returning random float values between 0 and 1?

I'm using SMOTE to resample a binary class TARGET_FRAUD which includes values 0 and 1. 0 has around 900 records, while 1 only has about 100 records. I want to oversample class 1 to around 800. This is to perform some classificatioin modeling. #fix…
vnguyen56
  • 55
  • 6
0
votes
0 answers

Code works sometimes but sometimes get TypeError: issubclass() when using imblearn's SMOTE

Trying to implement the code on here. The code was working fine then stopped, restarted updated and etc. Keep getting TypeError: issubclass() arg 2 must be a class or tuple of classes smote = SMOTE(random_state = 45) X_train1, X_test1, y_train1,…
John Ketterer
  • 137
  • 1
  • 1
  • 9
0
votes
0 answers

Upsampling using SMOTE in python

I am trying to use SMOTE in python to handle highly imbalanced data set. After splitting the data set into train and test I generate synthetic samples using SMOTE. Then I use xgboost algorithm on the SMOTE generated data. My model output is to…
0
votes
1 answer

Not able to import SMOTENC

I am able to import SMOTE from imblearn library but when importing SMOTENC it is throwing an error: 'ImportError: cannot import name 'SMOTENC'' I have tried changing the version of imblearn but no luck there from imblearn.over_sampling import…
0
votes
1 answer

BalancedBatchGenerator throws AttributeError model.fit_generator

I am a new to tf and keras, I am using colab notebook python 3, with 2.2.4-tf tensorflow.keras. Calling pip list shows imbalanced-learn 0.4.3 imblearn 0.0 i am trying to use from imblearn.keras…
rskd
  • 31
  • 1
  • 6
0
votes
1 answer

Random forest: balancing test set?

I am trying to run a Random Forest Classifier on an imbalanced dataset (~1:4). I am using the method from imblearn as follows: from imblearn.ensemble import…
Ramtin
  • 3
  • 1
0
votes
1 answer

Error while import imblearn.undersampling

I am getting "no module name sklearn.cluster" error while importing imblearn_undersampling module. I do not get error while importing SMOTE from imblearn as shown in the pic. Some of the solutions i have tried: -Uninstalled and reinstalled…
0
votes
1 answer

Problem with importing RUSBoostClassifier from imblearn

I have gotten stuck with importing RUSBoostClassifier following this example from imblearn.ensemble import RUSBoostClassifier I receive the following error: ImportError Traceback (most recent call…
0
votes
0 answers

Skip some of the transform steps (related to over and under sampling) in imbalanced-learn pipeline when predicting on test data set

For an imbalanced classification problem, I am using an imblearn pipeline along with sklearn's GridSearchCV (to find best hyper-params). The steps in the pipeline are as follows: Standardize each feature Correct for class imbalance by using ADASYN…
Chaos
  • 466
  • 1
  • 5
  • 12
0
votes
3 answers

Issues while importing imblearn

I am trying to import SMOTE in my jupyter notebook.I tried the following steps; I first installed imblearn using the following command in my terminal conda install -c glemaitre imbalanced-learn Then i used the following command to import imblearn…
learning_python
  • 111
  • 2
  • 11
0
votes
1 answer

Python Sklearn / Scikit & cx_freeze: module 'sklearn.tree._criterion' has no attribute 'Criterion'

I am currently trying to put a Python app, which uses Sklearn modules, in a stand-alone .exe file. My current cx_freeze setup.py looks like this: import os from cx_Freeze import setup, Executable base = "Win32GUI" os.environ['TCL_LIBRARY'] =…
Project_Prkt
  • 91
  • 1
  • 12
0
votes
1 answer

Difference between over sampling and upsampling and between SMOTE and over_sampling.SMOTE?

This question is a bit of paranoia, as in google the search results gets mixed by the audio and Fourier transform etc. Specifically for python, when it comes to numeric data, is there a difference between oversampling and upsampling of the minority…
user9238790
0
votes
0 answers

oversampling doesn't generate new samples

My dataset has the following distribution: class frequency 0 960 1 2093 2 22696 3 1116 4 2541 5 1298 6 14 I am using python-imblearn to oversample the minority class. With regular smote I am…
0
votes
1 answer

Classification report for cross validation pipeline

I am using Pipelines in Cross validations with SMOTE (imblearn library) for checking unbalanced dataset of fraud and non-fraud customers gbm0 = GradientBoostingClassifier(random_state=10) samplers = [['SMOTE',…
0
votes
1 answer

SMOTE algorithm initial condition

I am using SMOTE algorithm from the python imbalanced-learn package: from imblearn.over_sampling import SMOTE sm = SMOTE(kind='regular', n_neighbors = 4) : X_train_resampled, y_train_resampled = sm.fit_sample(X_train, y_train) I have explicitly…
Edamame
  • 23,718
  • 73
  • 186
  • 320
1 2 3
13
14