Questions tagged [imbalanced-data]

Problem definition

Imbalanced data occurs in machine-learning when:

"The user assigns more importance to the predictive performance... on a subset of the target variable domain."

"[T]he cases that are more relevant for the user are poorly represented in the training set."

Paula Branco, Luís Torgo, and Rita P. Ribeiro. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, Volume 49, Issue 2.

Software

imbalanced-learn: imblearn

Related Tags and Techniques

SMOTE: smote (Synthetic Minority Oversampling Technique)
Resampling: resampling
Oversampling: oversampling
Downsampling: downsampling

351 questions

votes

0 answers

how to make a soft accuracy and loss curves in deep learning models

There is an imbalance two class classification problem with 12750 samples for class 0 and 2550 samples for class 1. I've gotten class weights using class_weight.compute_class_weight and fed them to model.fit. I've tested many loss and optimizer…

tensorflow deep-learning imbalanced-data

asked Jul 27 '21 at 21:51

afrah

votes

0 answers

Distinct SVM models giving exactly the same results in R

I'm comparing the predictive power between two Support Vector Machine models in R. I have 6 response variables (categorical) and 24 predictor variables. In one of the models I'm using my data with unbalance between the response variables and in…

r machine-learning svm prediction imbalanced-data

asked Jul 15 '21 at 19:30

Rafael Nakamura

votes

1 answer

Imbalanced multiclass classification dataset: undersample or oversample?

Dataset has around 150k records with four labels: ['A','B','C','D'] and the distribution is as follows: A: 60000 B: 50000 C: 36000 D: 4000 I notice using the package classification report to get the precision, recall, and f1-score, the f1-score…

python multilabel-classification imbalanced-data

asked May 17 '21 at 04:11

mathgeek

votes

0 answers

Are oversampling and undersampling approaches good to build good models?

I just worked on "Heart Failure Prediction" dataset from kaggle ( https://www.kaggle.com/andrewmvd/heart-failure-clinical-data ) And i noticed the number of "Not dead" were more then the number of "dead" so i used SMOTETomek, which resampled my data…

python data-science kaggle imbalanced-data oversampling

asked May 06 '21 at 07:55

Jack Froster

votes

0 answers

LightGBM fails to predict on validation set (R)

I have big troubles implementing LightGBM on a extreme imbalanced dataset (using R) Indeed, I'm dealing with a binary classification problem and the distibution of the target variable is about 1:800 ( Approx: Class 0: 110 000 Class 1: 140 ) I…

r resampling lightgbm imbalanced-data smote

asked Apr 28 '21 at 14:50

CCbs

votes

1 answer

Difference between imblearn pipeline and Pipeline

I wanted to use sklearn.pipeline instead of using imblearn.pipeline to incorporate `RandomUnderSampler()'. My original data requires missing value imputation and scaling. Here I have breast cancer data as a toy example. However, it gave me the…

python machine-learning scikit-learn pipeline imbalanced-data

asked Apr 20 '21 at 18:57

ForestGump

votes

0 answers

How to deal with imbalanced datasets in neural network trainning

I am even struggling to explain in a brief but clear way my question, so I'll do my best effort to provide some background information before jumping directly into the question. Brackground I have a very imbalanced dataset that has 3 classes, which…

python tensorflow keras imbalanced-data

asked Apr 17 '21 at 23:04

xerac

votes

1 answer

How to correct Python Attribute error: 'SMOTE' object has no attribute 'fit_sample'

Hello: I am trying to run the following code: os = SMOTE(random_state=0) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0) columns = X_train.columns os_data_X,os_data_y=os.fit_sample(X_train, y_train) But get…

python imbalanced-data smote oversampling

asked Apr 14 '21 at 21:01

JWeds

votes

0 answers

Low G-mean and MCC for binary classification of imbalanced data

I have artificially increased the imbalance ratio to show the impact of different popular scoring metrics on the classification performance. Also, I have artificially added some missing values to see that my pipe line is working properly. However, I…

machine-learning scikit-learn classification imbalanced-data

asked Apr 08 '21 at 18:50

ForestGump

votes

1 answer

Appropriate f1 scoring for highly imbalanced data

I am confused with three different f1 computation. Which f1 scoring I should use for a severely imbalanced data? I am working on a severely imbalanced binary classification. ‘f1’ ‘f1_micro’ ‘f1_macro’ ‘f1_weighted’ Also, I want to add…

python machine-learning scikit-learn classification imbalanced-data

asked Apr 06 '21 at 18:45

ForestGump

votes

0 answers

What faster alternatives to SMOTE for imbalanced large data set are there in R?

I have a training set of 260,000 observations and 30 IVs and with binary class imbalance 1:6 (yes, it does mess up models' performance), but using SMOTE isn't an option, since it takes forever on my laptop with this amount of data. Is there any…

performance sampling imbalanced-data

asked Mar 28 '21 at 13:41

user000

votes

1 answer

In R, how do I run a balanced 10-fold CV information gain test for feature selection on imbalanced 2-class data?

I have a large training data set data.trn of 260,000+ observations on 50+ variables , with dependent variable loan_status consisting of 2 classes "paid off" and "default" with respective imbalance of about 5:1. I want to use information.gain command…

r validation sapply imbalanced-data information-gain

asked Mar 26 '21 at 08:33

user000

votes

0 answers

Feature reduction and class Imbalance handling which has to be performed first?

I am working on the feature extraction and class imbalance problems, but need advice on which one to perform first? Feature reduction/selection or to handle class imbalance first?

machine-learning feature-extraction imbalanced-data data-preprocessing

asked Mar 19 '21 at 05:53

Nasreen Devops

votes

1 answer

cannot import name 'SMOTEN' from 'imblearn.over_sampling'

SMOTE and SMOTENC is working. But unable to use SMOTEN. I tried solution in this. But still only for SMOTEN it returns the error, ImportError: cannot import name 'SMOTEN' from 'imblearn.over_sampling'. I am using Jupyter Notebook and below is the…

python jupyter imbalanced-data imblearn smote

asked Mar 16 '21 at 13:07

DOT

votes

1 answer

Max_samples hyperparameter in PU bagging for highly imbalanced dataset

I am using the credit card fraud dataset(link below) and it's highly imbalanced where the positive class has only 492 instances and the negative class has 284315 instances. I was applying PU Bagging (link below) on it to extract hidden positives in…

machine-learning scikit-learn random-forest imbalanced-data

asked Mar 15 '21 at 06:37

a.ydv

Prev 1 2 3

…

23 24 Next