Questions tagged [imbalanced-data]

Problem definition

Imbalanced data occurs in machine-learning when:

"The user assigns more importance to the predictive performance... on a subset of the target variable domain."

"[T]he cases that are more relevant for the user are poorly represented in the training set."

Paula Branco, Luís Torgo, and Rita P. Ribeiro. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, Volume 49, Issue 2.

Software

imbalanced-learn: imblearn

Related Tags and Techniques

SMOTE: smote (Synthetic Minority Oversampling Technique)
Resampling: resampling
Oversampling: oversampling
Downsampling: downsampling

351 questions

votes

0 answers

Precision significantly drops when using entire dataset to test a classifier trained on undersampled data

I'm doing the Kaggle Credit Card Fraud Detection. There is a significant imbalance between Class = 1 (fraudulent transaction) and Class = 0 (not fraudulent). To compensate, I undersampled the data so that there was a 1:1 ratio between fraudulent…

python machine-learning scikit-learn imbalanced-data

asked Mar 27 '20 at 17:25

Anuj S

votes

0 answers

How to do an evaluation of Logistic Regression with imbalanced dataset using sklearn?

I make Logistic Regression using python scikit-learn. I have an imbalanced dataset with 2/3 of datapoints having label y=0 and 1/3 having label y=1. I do a stratified splitting: X_train, X_test, y_train, y_test = train_test_split(X, y,…

python scikit-learn logistic-regression evaluation imbalanced-data

asked Mar 17 '20 at 11:30

LBoss

votes

1 answer

dealing with imbalanced classification data?

I am building a predictive model, on which I predict if a client will subscribe again or not. I already have the dataset and the problem is that it is imbalanced ( the NOs are more then the YESs). I believe that my model is biased, but when I check…

machine-learning imbalanced-data

asked Mar 16 '20 at 17:09

Yassire

votes

1 answer

Text classification with imbalanced data

Am trying to classify 10000 samples of text into 20 classes. 4 of the classes have just 1 sample each, I tried SMOTE to address this imbalance, but I am unable to generate new samples for classes that have only one record, though I could generate…

machine-learning nlp data-science text-classification imbalanced-data

asked Mar 16 '20 at 02:07

Sandeep Reddy

votes

1 answer

Is it feasible to have the training set < the test set after undersampling the majority class?

I have a data set of 1500 records with two classes which are imbalanced. Class 0 is 1300 records while Class 1 is 200 records, hence a ratio of ard 6.5:1. I built a random forest with this data set for classification. I know from past experience, if…

python tensorflow machine-learning data-science imbalanced-data

asked Mar 13 '20 at 09:33

randomforest1010

votes

0 answers

predict with scaled test data or not?

I have an imbalanced classification problem. first, I want to scale the data, then resample it by SMOTE. For preventing data leakage I used a pipeline. My code is: X_train, X_test, Y_train, y_test = train_test_split(X, y, test_size = 0.20,…

python pipeline scaling imbalanced-data imblearn

asked Mar 10 '20 at 13:34

Parvin Khorasani

votes

1 answer

Imbalaced-learn doesn't work even it has been installed

This is odd, I'm using 3.7 python and my libraries the dependent to imbalanced-learn is satistied too. However, when I import the library in Jupyter, it produces error. Can anyone please advice? --> 13 from imblearn import FunctionSampler 14…

python scikit-learn jupyter-notebook imbalanced-data imblearn

asked Mar 02 '20 at 04:59

Cassie.L

votes

0 answers

Imbalanced Class Learning

I'm dealing with a imbalanced class classification problem in which i have imbalanced ratio as 0:1 = 717.26:1. I tried many models out of which i found GBM working best for my case. Than i came across a research paper and an article to deal with…

r machine-learning statistics data-science imbalanced-data

asked Feb 28 '20 at 12:14

Lokesh Arya

votes

1 answer

How to handle Imbalanced Datatset and outliers in python?

I have 2 doubts : If we have a classification problem with a dataframe that has large no of features (columns > 100) and if say 20/30 of them are highly correlated and the target columns (y) is very skewed towards one class ; should we first…

python data-science outliers imbalanced-data

asked Feb 20 '20 at 10:16

Ajay Alex

votes

1 answer

F-Score difference between cross_val_score and StratifiedKFold

I want to use a Random Forest Classifier on imbalanced data where X is a np.array representing the features and y is a np.array representing the labels (labels with 90% 0-values, and 10% 1-values). As I was not sure how to do stratification within…

python scikit-learn random-forest cross-validation imbalanced-data

asked Feb 17 '20 at 22:55

Sven

votes

1 answer

imbalanced dataset with Keras deep learning

I have a datasets that looks like this: Training (Class 0: 471, Class 1: 986) Testing (Class 0: 177, Class 1: 246. I split my data as 80% for training and 20% for validation. I know that is an imbalanced dataset, and I have tried Class_weight but…

python keras confusion-matrix imbalanced-data

asked Feb 12 '20 at 17:23

Khalid El Asnaoui

votes

2 answers

SMOTE-NC in R. No packages found

I have a dataset with 5 nominal and 37 categorical variables. I want to perform oversampling in R. However, with SMOTE, I cannot do so. I looked for SMOTE-NC as advised by (Chawla, Bowyer and Hall, 2002), but I could not find any package supporting…

r oversampling imbalanced-data smote

asked Dec 31 '19 at 15:24

Kambiz Rakhshan

votes

1 answer

Using SMOTEENN in GridSearchCV Pipeline with Preprocesing

I am working on a classification problem with a highly imbalanced dataset. I am trying to use SMOTEENN in the grid search pipeline, however I keep getting this ValueError: ValueError: Invalid parameter randomforestclassifier for estimator…

random-forest sklearn-pandas grid-search imblearn imbalanced-data

asked Dec 29 '19 at 04:19

David Kebudi

votes

1 answer

Adjust predicted probability after smote

I have an imbalance data set and I used smote to oversample the minority class and undersample the majority class. now, I want to check the test AUC using predict_proba of the model. I have two questions: 1. Do I have to correct the probability if I…

probability calibration imbalanced-data smote

asked Nov 21 '19 at 16:52

anat

votes

1 answer

What is the set of negative data points for each classifier when using OneVsRest classification in scikit-learn?

I am trying to train a OneVsAll multiclass logistic regression model using sklearn.linear_model.LogisticRegression(multiclass='ovr'). My dataset has over 1000 classes and 2 million training examples. From what I understood was that this method will…

machine-learning scikit-learn multiclass-classification imbalanced-data

asked Nov 21 '19 at 06:52

luqmankgp

Prev 1 2 3

…

23 24 Next