Questions tagged [imblearn]

Python Imbalanced learning package. To improve results or speed of learning process in Machine Learning algorithms on datasets where one or more of the classes has significantly less / more training examples you can use imbalanced learning approach. Imbalanced learning methods use re-sampling techniques like SMOTE, ADASYN, Tomek links, and their various combinations.

imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance. It is compatible with scikit-learn and is part of scikit-learn-contrib projects.

Most classification algorithms will only perform optimally when the number of samples of each class is roughly the same. Highly skewed datasets, where the minority is heavily outnumbered by one or more classes, have proven to be a challenge while at the same time becoming more and more common.

One way of addressing this issue is by re-sampling the dataset as to offset this imbalance with the hope of arriving at a more robust and fair decision boundary than you would otherwise.

Re-sampling techniques are divided in two categories:

    Under-sampling the majority class(es).
    Over-sampling the minority class.
    Combining over- and under-sampling.
    Create ensemble balanced sets.

Below is a list of the methods currently implemented in this module.

Under-sampling

  1. Random majority under-sampling with replacement
  2. Extraction of majority-minority Tomek links
  3. Under-sampling with Cluster Centroids
  4. NearMiss-(1 & 2 & 3)
  5. Condensed Nearest Neighbour
  6. One-Sided Selection
  7. Neighboorhood Cleaning Rule
  8. Edited Nearest Neighbours
  9. Instance Hardness Threshold
    1. Repeated Edited Nearest Neighbours
    2. AllKNN

Over-sampling 12. Random minority over-sampling with replacement 13. SMOTE - Synthetic Minority Over-sampling Technique 14. bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2 15. SVM SMOTE - Support Vectors SMOTE 16. ADASYN - Adaptive synthetic sampling approach for imbalanced learning

  1. Over-sampling followed by under-sampling

    • SMOTE + Tomek links
    • SMOTE + ENN
  2. Ensemble classifier using samplers internally

    • EasyEnsemble
    • BalanceCascade
    • Balanced Random Forest
    • Balanced Bagging

Resources:

205 questions
-1
votes
1 answer

scikit-learn error although it is properly installed

My code is as follows: from imblearn import over_sampling I get this error: cannot import name 'DistanceMetric' from 'sklearn.metrics' A simple import on imblearn is giving the same error. I tried reinstalling scikit learn and scipy, but I still…
ella
  • 89
  • 1
  • 7
-1
votes
1 answer

TypeError: fit_resample() missing 1 required positional argument: 'y'

Using imblearn for the imbalanced datasets, the parameters seems to have changed. I am using undersampling.NearMiss. Here is the code: from imblearn import under_sampling balanced = under_sampling.NearMiss() X_res, y_res =…
Vishal Rana
  • 119
  • 1
  • 7
-1
votes
1 answer

Python: name 'RandomOverSampler' is not defined

Am trying to use imblearn to do some over and under sampling on a dataframe. However when calling either function (e.g. RandomOverSampler), it says that it is not defined. the imblearn library is included import imblearn when calling…
Abdul Ali
  • 1,905
  • 8
  • 28
  • 50
-1
votes
1 answer

How to get indices of created samples in Imblearn

I am using different imblearn over-sampling methods on a data-set which contains ~55800 samples. About 200 are class 1, the rest class 0. I am oversampling class 1 with various over-sampling-strategies. It does not improve my model quality and…
Andreas bleYel
  • 463
  • 2
  • 5
  • 7
-1
votes
2 answers

How to fix samples < K-neighbours error in oversampling using SMOTE?

I am designing a multi class classifier for 11 labels. I am using SMOTE to tackle the sampling problem. However I face the following error:- Error at SMOTE from imblearn.over_sampling import SMOTE sm = SMOTE(random_state=42) X_res, Y_res =…
-1
votes
1 answer

Why does imblearn works with jupyter notebook python 2 but not 3?

I'm trying to run a code, but i get an error. If i have the kernel on python 2 then imblearn works smoothly, but pipe_grid doesn't work. When i switch to python 3, pipe_grid works but imblearn stops working. I didn't share the code because it is…
-2
votes
1 answer

Unable to import from imblearn.over_sampling import SMOTE

I have installed imblearn using pip install -U imbalanced-learn #version: conda version : 4.4.10 conda-build version : 3.4.1 python version : 3.6.4.final.0 I keep getting error related to numpy and scipy like module 'numpy.random' has no…
aim
  • 301
  • 1
  • 3
  • 10
-3
votes
1 answer

ModuleNotFoundError: No module named 'imblearn', tried install package

I wish to import some libraries in imblearn(from imblearn.over_sampling import RandomOverSampler), but this error occured:ModuleNotFoundError: No module named 'imblearn'. I already tried pip install imbalanced-learn and conda install -c conda-forge…
Lucas Liu
  • 1
  • 1
-3
votes
2 answers

pipeline for RandomOversampler, RandomForestClassifier & GridSearchCV

I am working on a binary text classification problem. As the classes are highly imbalanced, I am using sampling techniques like RandomOversampler(). Then for classification I would use RandomForestClassifier() whose parameters need to be tuned using…
1 2 3
13
14