Questions tagged [imblearn]

Python Imbalanced learning package. To improve results or speed of learning process in Machine Learning algorithms on datasets where one or more of the classes has significantly less / more training examples you can use imbalanced learning approach. Imbalanced learning methods use re-sampling techniques like SMOTE, ADASYN, Tomek links, and their various combinations.

imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance. It is compatible with scikit-learn and is part of scikit-learn-contrib projects.

Most classification algorithms will only perform optimally when the number of samples of each class is roughly the same. Highly skewed datasets, where the minority is heavily outnumbered by one or more classes, have proven to be a challenge while at the same time becoming more and more common.

One way of addressing this issue is by re-sampling the dataset as to offset this imbalance with the hope of arriving at a more robust and fair decision boundary than you would otherwise.

Re-sampling techniques are divided in two categories:

    Under-sampling the majority class(es).
    Over-sampling the minority class.
    Combining over- and under-sampling.
    Create ensemble balanced sets.

Below is a list of the methods currently implemented in this module.

Under-sampling

Random majority under-sampling with replacement
Extraction of majority-minority Tomek links
Under-sampling with Cluster Centroids
NearMiss-(1 & 2 & 3)
Condensed Nearest Neighbour
One-Sided Selection
Neighboorhood Cleaning Rule
Edited Nearest Neighbours
Instance Hardness Threshold
1. Repeated Edited Nearest Neighbours
2. AllKNN

Over-sampling 12. Random minority over-sampling with replacement 13. SMOTE - Synthetic Minority Over-sampling Technique 14. bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2 15. SVM SMOTE - Support Vectors SMOTE 16. ADASYN - Adaptive synthetic sampling approach for imbalanced learning

Over-sampling followed by under-sampling
- SMOTE + Tomek links
- SMOTE + ENN
Ensemble classifier using samplers internally
- EasyEnsemble
- BalanceCascade
- Balanced Random Forest
- Balanced Bagging

Resources:

205 questions

votes

1 answer

TypeError: init() got an unexpected keyword argument 'random_state'

I tried to handle imbalanced dataset using imblearn as: nm = NearMiss(random_state=42) X_bal,Y_bal = nm.fit_sample(x,y) But I am getting an unexpected error: TypeError: __init__() got an unexpected keyword argument 'random_state' How to fix this…

python imblearn

asked Apr 14 '21 at 06:46

aakriti aggarwal

votes

1 answer

Python package for SMOTEBoosting algorithm

I am looking for a Python package that implements the SMOTEBoosting algorithm. But I only find SMOTE in imbalanced-learn.

python imbalanced-data imblearn

asked Nov 26 '20 at 16:07

tides

votes

0 answers

Modify Balanced Random Forest Sampling

I want to change the sampling type in the BalancedRandomForest library from RandomUnderSampling to ClusterCentroid, I have changed this part self.base_sampler_ = RandomUnderSampler( sampling_strategy=self._sampling_strategy, …

python machine-learning scikit-learn random-forest imblearn

asked Oct 09 '20 at 17:32

gembul

votes

1 answer

What does it mean AttributeError: 'ColumnSelector' object has no attribute 'n_features_in_'?

I am making a grid search for tuning hyperparameters of a stacking estimator(StackingClassifier object from sklearn.ensemble library). I making use of the scikit library for ML, and the RandomizedSearchCV function. In adition to this, the base…

scikit-learn pipeline gridsearchcv imblearn mlxtend

asked Aug 03 '20 at 22:17

Jonathan

votes

1 answer

problem defining a custom metric to calculate "geometric mean score" for "tensorflow.keras"

I am working on an imabalanced classification problem in tensorflow.keras. And I decided to calculate "geometric mean score" as suggested by this answer on cross validated. I found an implementation of it in a package called imbalanced-learn and…

python tensorflow keras multiclass-classification imblearn

asked Jul 22 '20 at 06:50

Naveen Reddy Marthala

2,622
4
35
67

votes

2 answers

How to return text data as output after oversampling using SMOTE?

I have a multi class text data which I want to SMOTE because of the minority labels. I already did this, but I'm getting sparce matrix as my output. Is there a way to get the text data back after SMOTE? Here is my code sample: X_train =…

python imblearn

asked Jul 17 '20 at 14:09

Eniola

votes

2 answers

'NearMiss' object has no attribute '_validate_data'

Detailed Image This is the code below which shows the error. from imblearn.under_sampling import NearMiss nm = NearMiss() X_res,y_res=nm.fit_sample(X,Y)

python imbalanced-data imblearn

asked Jul 08 '20 at 17:46

Yashraj Jain

votes

1 answer

Tensorflow, imblearn import issues:

What is this error exactly and how do I solve this? Running the latest version of TensorFlow and Keras TypeError Traceback (most recent call last) in 26 27 import…

python tensorflow keras imblearn

asked Jul 08 '20 at 01:57

Neel Save

votes

1 answer

Imbalances-learn module base.py file syntax error coming up while importing SMOTE

I installed imbalanced-learn package using (Python 2.7): conda install -c conda-forge imbalanced-learn after installing it, I tried to import SMOTE from the package. from imblearn.over_sampling import SMOTE which gave the following error: File…

python python-2.7 anaconda imblearn smote

asked Jun 02 '20 at 08:06

HawkEye04

votes

1 answer

How to oversample a 3d array?

I'm trying to predict the category of a news article based on 2 features: author name and article headline. I have transformed both columns separately using CountVectorizer and TfidfTransformer. Thus, what I have now is a 3D array (ie. array of list…

python numpy resampling oversampling imblearn

asked May 12 '20 at 11:50

Brian

votes

3 answers

How to get sample indices from RandomUnderSampler in imblearn

Does anyone know if/how one can get the indices of the selected samples after undersampling with imblearn's RandomUnderSampler? There used to be the argument "return_indices=True" which was now removed for the new version and supposingly was…

python machine-learning imblearn

asked Mar 19 '20 at 17:47

ramobal

votes

1 answer

Imbalanced-learn: Import Error: cannot import name 'MultiOutputMixin'

I've re-installed the latest scikit-learn and imbalanced-learn. I've also checked all other libraries to make sure they are compatible with imbalanced-learn. I just want to run a simple RandomOverSample(), but I got the following import error…

python scikit-learn imbalanced-data imblearn

asked Feb 17 '20 at 01:16

Cassie.L

votes

1 answer

AttributeError: 'DataFrame' object has no attribute 'name' when using SMOTE

I am using imblearn over_sampling SMOTE technique in order to balance my imbalanced dataset. Here is my sample code import pandas as pd dataset=pd.read_csv('E://IOT_Netlume//hourly_data.csv') features= dataset.iloc[:,[1,2,3,4]] target=…

python-3.x dataframe oversampling imblearn smote

asked Dec 23 '19 at 09:54

DS_Geek

votes

2 answers

Passing GridSearchCV results to an Imbalanced-Learn's Pipeline object

Funny issue here - I have GridSearchCV results, which after cherry-picking from grid_search_cv.results_ attribute are captured as follows: Input: pd.DataFrame(grid_clf_rf.cv_results_).iloc[4966]['params'] Output: {'rf__max_depth': 40,…

python pandas scikit-learn imblearn

asked Jun 28 '19 at 00:12

Greem666

votes

1 answer

imblearn smote+enn under sampled the majority class

I have an imbalanced dataset and when I try to balance him using SMOTEENN, the count of majority class decreasing by half I tried to change the 'sampling_strategy' parameter, with all the provided options but it not help from imblearn.combine…

python machine-learning dataset imblearn

asked Apr 01 '19 at 19:20

ZaKad

Prev 1 2 3

…

13 14 Next