Questions tagged [imblearn]

Python Imbalanced learning package. To improve results or speed of learning process in Machine Learning algorithms on datasets where one or more of the classes has significantly less / more training examples you can use imbalanced learning approach. Imbalanced learning methods use re-sampling techniques like SMOTE, ADASYN, Tomek links, and their various combinations.

imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance. It is compatible with scikit-learn and is part of scikit-learn-contrib projects.

Most classification algorithms will only perform optimally when the number of samples of each class is roughly the same. Highly skewed datasets, where the minority is heavily outnumbered by one or more classes, have proven to be a challenge while at the same time becoming more and more common.

One way of addressing this issue is by re-sampling the dataset as to offset this imbalance with the hope of arriving at a more robust and fair decision boundary than you would otherwise.

Re-sampling techniques are divided in two categories:

    Under-sampling the majority class(es).
    Over-sampling the minority class.
    Combining over- and under-sampling.
    Create ensemble balanced sets.

Below is a list of the methods currently implemented in this module.

Under-sampling

  1. Random majority under-sampling with replacement
  2. Extraction of majority-minority Tomek links
  3. Under-sampling with Cluster Centroids
  4. NearMiss-(1 & 2 & 3)
  5. Condensed Nearest Neighbour
  6. One-Sided Selection
  7. Neighboorhood Cleaning Rule
  8. Edited Nearest Neighbours
  9. Instance Hardness Threshold
    1. Repeated Edited Nearest Neighbours
    2. AllKNN

Over-sampling 12. Random minority over-sampling with replacement 13. SMOTE - Synthetic Minority Over-sampling Technique 14. bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2 15. SVM SMOTE - Support Vectors SMOTE 16. ADASYN - Adaptive synthetic sampling approach for imbalanced learning

  1. Over-sampling followed by under-sampling

    • SMOTE + Tomek links
    • SMOTE + ENN
  2. Ensemble classifier using samplers internally

    • EasyEnsemble
    • BalanceCascade
    • Balanced Random Forest
    • Balanced Bagging

Resources:

205 questions
2
votes
1 answer

TypeError: __init__() got an unexpected keyword argument 'random_state'

I tried to handle imbalanced dataset using imblearn as: nm = NearMiss(random_state=42) X_bal,Y_bal = nm.fit_sample(x,y) But I am getting an unexpected error: TypeError: __init__() got an unexpected keyword argument 'random_state' How to fix this…
2
votes
1 answer

Python package for SMOTEBoosting algorithm

I am looking for a Python package that implements the SMOTEBoosting algorithm. But I only find SMOTE in imbalanced-learn.
tides
  • 25
  • 3
2
votes
0 answers

Modify Balanced Random Forest Sampling

I want to change the sampling type in the BalancedRandomForest library from RandomUnderSampling to ClusterCentroid, I have changed this part self.base_sampler_ = RandomUnderSampler( sampling_strategy=self._sampling_strategy, …
2
votes
1 answer

What does it mean AttributeError: 'ColumnSelector' object has no attribute 'n_features_in_'?

I am making a grid search for tuning hyperparameters of a stacking estimator(StackingClassifier object from sklearn.ensemble library). I making use of the scikit library for ML, and the RandomizedSearchCV function. In adition to this, the base…
2
votes
1 answer

problem defining a custom metric to calculate "geometric mean score" for "tensorflow.keras"

I am working on an imabalanced classification problem in tensorflow.keras. And I decided to calculate "geometric mean score" as suggested by this answer on cross validated. I found an implementation of it in a package called imbalanced-learn and…
2
votes
2 answers

How to return text data as output after oversampling using SMOTE?

I have a multi class text data which I want to SMOTE because of the minority labels. I already did this, but I'm getting sparce matrix as my output. Is there a way to get the text data back after SMOTE? Here is my code sample: X_train =…
Eniola
  • 133
  • 10
2
votes
2 answers

'NearMiss' object has no attribute '_validate_data'

Detailed Image This is the code below which shows the error. from imblearn.under_sampling import NearMiss nm = NearMiss() X_res,y_res=nm.fit_sample(X,Y)
Yashraj Jain
  • 88
  • 2
  • 8
2
votes
1 answer

Tensorflow, imblearn import issues:

What is this error exactly and how do I solve this? Running the latest version of TensorFlow and Keras TypeError Traceback (most recent call last) in 26 27 import…
Neel Save
  • 21
  • 3
2
votes
1 answer

Imbalances-learn module base.py file syntax error coming up while importing SMOTE

I installed imbalanced-learn package using (Python 2.7): conda install -c conda-forge imbalanced-learn after installing it, I tried to import SMOTE from the package. from imblearn.over_sampling import SMOTE which gave the following error: File…
HawkEye04
  • 23
  • 3
2
votes
1 answer

How to oversample a 3d array?

I'm trying to predict the category of a news article based on 2 features: author name and article headline. I have transformed both columns separately using CountVectorizer and TfidfTransformer. Thus, what I have now is a 3D array (ie. array of list…
Brian
  • 33
  • 1
  • 6
2
votes
3 answers

How to get sample indices from RandomUnderSampler in imblearn

Does anyone know if/how one can get the indices of the selected samples after undersampling with imblearn's RandomUnderSampler? There used to be the argument "return_indices=True" which was now removed for the new version and supposingly was…
ramobal
  • 241
  • 2
  • 9
2
votes
1 answer

Imbalanced-learn: Import Error: cannot import name 'MultiOutputMixin'

I've re-installed the latest scikit-learn and imbalanced-learn. I've also checked all other libraries to make sure they are compatible with imbalanced-learn. I just want to run a simple RandomOverSample(), but I got the following import error…
Cassie.L
  • 311
  • 1
  • 7
  • 19
2
votes
1 answer

AttributeError: 'DataFrame' object has no attribute 'name' when using SMOTE

I am using imblearn over_sampling SMOTE technique in order to balance my imbalanced dataset. Here is my sample code import pandas as pd dataset=pd.read_csv('E://IOT_Netlume//hourly_data.csv') features= dataset.iloc[:,[1,2,3,4]] target=…
DS_Geek
  • 53
  • 1
  • 10
2
votes
2 answers

Passing GridSearchCV results to an Imbalanced-Learn's Pipeline object

Funny issue here - I have GridSearchCV results, which after cherry-picking from grid_search_cv.results_ attribute are captured as follows: Input: pd.DataFrame(grid_clf_rf.cv_results_).iloc[4966]['params'] Output: {'rf__max_depth': 40,…
Greem666
  • 919
  • 13
  • 24
2
votes
1 answer

imblearn smote+enn under sampled the majority class

I have an imbalanced dataset and when I try to balance him using SMOTEENN, the count of majority class decreasing by half I tried to change the 'sampling_strategy' parameter, with all the provided options but it not help from imblearn.combine…
ZaKad
  • 41
  • 3