Questions tagged [imbalanced-data]

Problem definition

Imbalanced data occurs in machine-learning when:

"The user assigns more importance to the predictive performance... on a subset of the target variable domain."

"[T]he cases that are more relevant for the user are poorly represented in the training set."

Paula Branco, Luís Torgo, and Rita P. Ribeiro. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, Volume 49, Issue 2.

Software

imbalanced-learn: imblearn

Related Tags and Techniques

SMOTE: smote (Synthetic Minority Oversampling Technique)
Resampling: resampling
Oversampling: oversampling
Downsampling: downsampling

351 questions

votes

1 answer

The reason of different results of KNN algorithm from PYOD & Sklearn packages

Beside this post, I experimented with KNN algorithms using sklearn and PYOD packages for unsupervised approach on benchmark dataset for anomaly detection task and I get different…

python machine-learning knn anomaly-detection imbalanced-data

asked Apr 12 '22 at 17:58

Mario

1,631
2
21
51

votes

0 answers

binary search tree_ how to update and calculate the imbalance_python

I am building a binary search tree, and I want to update the imbalance when I add a child and use this function in the add_child function. But now I have met some problem, can someone tell me where is wrong? Thank you very much! And it is correct to…

tree binary binary-tree binary-search-tree imbalanced-data

asked Mar 30 '22 at 23:47

Ashhhh

votes

1 answer

Get error: unexpected keyword argument 'random_state' when using TomekLinks

My code is: undersample = TomekLinks(sampling_strategy='majority', n_jobs= -1, random_state = 42) X_tl, y_tl = undersample.fit_resample(X, y) When I run it, I get this error: TypeError: __init__() got an unexpected keyword argument…

python data-processing imbalanced-data

asked Mar 24 '22 at 14:11

Amit S

votes

1 answer

Warning Message in binary classification model Gaussian Naive Bayes?

I am using a multiclass classification-ready dataset with 14 continuous variables and classes from 1 to 10. This is the data file: https://drive.google.com/file/d/1nPrE7UYR8fbTxWSuqKPJmJOYG3CGN5y9/view?usp=sharing My goal is to apply the…

python machine-learning scikit-learn naivebayes imbalanced-data

asked Mar 20 '22 at 07:16

Piers

votes

1 answer

matplotlib: histogram of SMOTEd class distribution showing colored synthetic region

Say I have a binary imbalanced dataset like so: from collections import Counter from sklearn.datasets import make_classification from matplotlib import pyplot as plt from imblearn.over_sampling import SMOTE # fake dataset X, y =…

python matplotlib data-visualization imbalanced-data smote

asked Mar 08 '22 at 19:44

user12587364

votes

0 answers

Predictions stuck at zero when positive label (1) is only 16% of data

So, I run the same code with a 50/50 split of 0 and 1 label, I get aboyt 70% accuracy on val set and my val preds are not stuck at 0. However, when I run the code on a dataset with 84/16 % split of labels 0 and 1, all my val preds end up being 0. I…

python deep-learning pytorch loss-function imbalanced-data

asked Feb 28 '22 at 16:38

Mona Jalal

34,860
64
239
408

votes

0 answers

Multiclass Sampling Strategy

Scenario : Currently I am working on multiclass classification problem. I have 2 million historical dataset of having 180 classes and need to create model which will predict the classes accurately. I have created model using HybridGradientboosting…

python imbalanced-data downsampling imblearn oversampling

asked Feb 23 '22 at 08:33

Makarand Rayate

votes

4 answers

How to handle imbalanced data in general

I have been working on the case study where data is highly imbalanced. we have been taught we can handle the imbalanced data by either under sampling the majority class or over sampling the minority class. I wanted to ask if there is any other…

machine-learning data-science imbalanced-data

asked Feb 13 '22 at 14:11

Gitanjali M

votes

0 answers

samples with almost identical features but different classes and poor classification preformance(recall and precision)

I have 77000 text samples that 4900 of them are positive and about 72000 of them are negative (binary classification) and the maximum length of these samples are 15 (These samples are sentences). Not only are the data imbalanced but also positive…

lstm precision similarity imbalanced-data

asked Feb 01 '22 at 18:53

soheila

votes

0 answers

True Negatives have better prediction than True Positives

I have applied Logistic Regression on the data containing both binary and numerical predictors with a binary target. The confusion matrix of the results has True Negatives(65%) followed by False Positive(>20%) higher than True Positive(8%). I need…

python logistic-regression binary-data precision-recall imbalanced-data

asked Jan 30 '22 at 18:22

LBala

votes

1 answer

How can I resolve imbalanced datasets for AutoML classification on GCP?

I am planning to use AutoML for the classification of my tabular data. But there is a moderate imbalance in the target variable. When running my own model, I would either upsample, downsample or build synthetic samples to resolve the imbalance. Is…

google-cloud-platform classification google-cloud-automl imbalanced-data

asked Jan 21 '22 at 01:03

Avantika Banerjee

votes

0 answers

AUROC for imbalanced dataset

this is my first question here and I hope you can help me. At the moment I'm training a binary classifier for medical images and my dataset is imbalanced with a ratio of roughly 0.8 (negative) to 0.2 (positive). My code is written with pytorch and…

pytorch metrics evaluation imbalanced-data pytorch-lightning

asked Jan 11 '22 at 15:50

user123

votes

0 answers

Does model underfitting based-on Accuracy matter for imbalanced data?

I am training a deep learning model on imbalanced data for binary classification. I used binary_crossentropy for the loss function and Accuracy for the metric. When I plotted the loss, I got an underfitting. Is that a problem as my data is…

deep-learning imbalanced-data overfitting-underfitting

asked Jan 08 '22 at 07:14

Siasma

votes

1 answer

Should we actively use the weight argument in loss functions

Most of the current machine learning libraries have loss functions that comes with a weight argument, which allows us to tackle unbalanced datasets. However should this feature be actively made use of? If not, are there certain guidelines as to when…

pytorch artificial-intelligence loss imbalanced-data

asked Jan 03 '22 at 03:26

tangolin

votes

1 answer

Binary data however oversampler states it is multilabeled

I am using the Kaggle's Twitter Dataset and I am trying to oversample the minority class. Despite y being binary, the oversampler returns an error stating that it is multi-class My x and y are the tweets and the labels respectively.

scikit-learn nlp imbalanced-data

asked Dec 20 '21 at 02:51

Randy Chng

Prev 1 2 3

…

23 24 Next