Questions tagged [imbalanced-data]

Problem definition

Imbalanced data occurs in machine-learning when:

"The user assigns more importance to the predictive performance... on a subset of the target variable domain."

"[T]he cases that are more relevant for the user are poorly represented in the training set."

Paula Branco, Luís Torgo, and Rita P. Ribeiro. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, Volume 49, Issue 2.

Software

imbalanced-learn: imblearn

Related Tags and Techniques

SMOTE: smote (Synthetic Minority Oversampling Technique)
Resampling: resampling
Oversampling: oversampling
Downsampling: downsampling

351 questions

votes

0 answers

Fit the model using entire data or from training data?

I am given two data. Firstly, the train data with known class (target) Secondly, the test data with no class (no target) I split the training data into train set and validation set . I oversample the train data and test it on my validation set. It…

training-data imbalanced-data smote oversampling

asked Oct 03 '22 at 21:12

Rotimi Omosewo

votes

0 answers

Why we cannot calculate an ROC curve in cost sensitive learning?

In the Applied Predictive Modeling book, cost sensitivity learning approach, the author(s) write: One consequence of this approach is that class probabilities cannot be generated for the model, at least in the available implementation. Therefore we…

roc auc imbalanced-data cohen-kappa

asked Sep 21 '22 at 03:23

Gioi Hoc Sinh

votes

1 answer

Poorly calibrated probabilities but good classification in confusion matrix

I have an imbalanced data set. My goal is to balance sensitivity and specificity via the confusion matrix. I used glmnet in r with class weights. The model does well at balancing the sensitivity/specificity, but I looked at the calibration plot, and…

classification r-caret confusion-matrix calibration imbalanced-data

asked Sep 18 '22 at 22:16

mapleleaf

votes

2 answers

imbalabced data set score after smote

Is it correct to use 'accuracy' as a metric for an imbalanced data set after using oversampling methods such as SMOTE or we have to use other metrics such as AUROC or other presicion-recall related metrics?

python imbalanced-data

asked Sep 15 '22 at 07:11

Mo Amini

votes

1 answer

How to handle unblanced labels in Multilabel Classification?

These oversimplified example target vectors (in my use case each 1 represents a product that a client bought at least once a…

machine-learning keras multilabel-classification imbalanced-data

asked Sep 09 '22 at 17:38

Viktor

votes

1 answer

How to handle input value error when using under sampling methods from imblearn?

Thank you for your help in advance. I am trying to use the RandomUnderSampler() and fit_sample() methods from imblearn to balance a botnet dataset with two missing values. The dataset contains a label column for binary classification that uses 0 and…

python azure machine-learning imbalanced-data

asked Sep 09 '22 at 16:02

Ghada

votes

0 answers

SMOTENC for imbalanced multiclass classification using a pipeline gives nan value

I am using a dataset with null values and also a mix of categorical and continuous data. Initially, I replaced the null values in certain columns and then used the SMOTENC in the pipeline with stratifiedKfold ..the accuracy and ROC score is always…

machine-learning pipeline nan imbalanced-data smote

asked Sep 04 '22 at 18:20

shirina

votes

0 answers

Using Class Weights or Sample Weights for One-Hot Encoded labels with keras models

I want to use class weights or sample weights to balance my data during model training. My dataset is an images dataset where we have 20 classes in total. The dataset is highly imbalanced. I have created a data loader that loads multiple-images…

python keras one-hot-encoding imbalanced-data

asked Sep 03 '22 at 15:36

stackersTech101

votes

0 answers

Using SMOTE for BERT inputs

I have some imbalanced data which I need to classify. I want to use SMOTE to balance it. But I don't really understand how to use it since I have BERT multiple inputs. Do I need to use it for input_ids? Or attention_masks? Or both? Also, a piece of…

nlp bert-language-model imbalanced-data smote

asked Jul 01 '22 at 10:14

atlas

votes

1 answer

Oversampling on binary classification

everyone. I am doing a binary classification on a huge dataset (190 columns, 500K records). The target values are 0 and 1. However, when I do the oversampling with SMOTE, new target values in the y-vector are created (0, 1, 2 for example). I do not…

python tensorflow imbalanced-data smote oversampling

asked Jun 29 '22 at 17:29

Johnny Torres

votes

1 answer

Imbalanced classification with xgboost in python with scale_pos_weight not working properly

I am using xgboost with python in order to perform a binary classification in which the class 0 appears roughly 9 times more frequently than the class 1. I am of course using scale_pos_weight=9. However, when I perform the prediction on the testing…

python xgboost imbalanced-data

asked Jun 29 '22 at 02:09

donut

votes

1 answer

When I use imblearn pipeline instead of sklearn pipeline all textual features disappear. Any solution?

This is my code below, I need to use SMOTENC to balance the dataset, which means I have to use the pipeline from the imblearn library. However, it does not recognize the CountVectorizer features from imblearn.pipeline import Pipeline # from…

python scikit-learn pipeline imbalanced-data smote

asked Jun 06 '22 at 02:12

Ahmad Abdel-Hafez

votes

1 answer

Can I use RandomUnderSampler for categorical data as well?

AFAIK, unlike SMOTE, RandomUnderSampler selects a subset of the data. But I am not quite confident to use it for categorical data. So, is it really applicable for categorical data?

python machine-learning classification imbalanced-data

asked May 22 '22 at 17:38

user14596364

votes

1 answer

Improving performance result of classification for severely imbalance data having abnormal skewed distribution

I have a large dataset D which I balanced using under sampling method called RandomUnderSampler from imblearn package which reduce the class data with majority. The data have three classes: Yes (1), No (0), Unfinished (2). This is the minimal 3d…

python machine-learning deep-learning classification imbalanced-data

asked May 17 '22 at 14:47

Shihab Ullah

votes

2 answers

'BalancedBaggingClassifier' object has no attribute 'n_features_in_'

i am working on an imbalanced multi-class dataset, i am trying to pass it into a balancedBaggingClassifier but i keep getting the error below : code: import pandas as pd dataframe = pd.read_excel('mergedDataset.xlsx') from sklearn import…

python imbalanced-data

asked May 14 '22 at 22:41

Aya Lihoum

Prev 1 2 3

…

23 24 Next