Questions tagged [imbalanced-data]

Problem definition

Imbalanced data occurs in machine-learning when:

"The user assigns more importance to the predictive performance... on a subset of the target variable domain."

"[T]he cases that are more relevant for the user are poorly represented in the training set."

Paula Branco, Luís Torgo, and Rita P. Ribeiro. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, Volume 49, Issue 2.

Software

imbalanced-learn: imblearn

Related Tags and Techniques

SMOTE: smote (Synthetic Minority Oversampling Technique)
Resampling: resampling
Oversampling: oversampling
Downsampling: downsampling

351 questions

votes

0 answers

Implementing dynamic radius to radius neighbor classifier for better class imbalance handling

I am trying to create dynamic radius based radius neighbour classifier for one multiclass classification problem. This dataset is havig 7 classses. I am giving different radius to each class and then passing it to radius neighbour classifier. My…

python machine-learning scikit-learn imbalanced-data

asked Jun 13 '23 at 10:38

sheldon cooper

votes

0 answers

class weights using customized generator function keras

I have written the following generator function. In order to use class weights, with this generator, I receive an error when I use the following commands training_generator=image_generator(partition['train'], labels, bat_siz ) counter =…

keras generator imbalanced-data

asked Jun 02 '23 at 16:18

Akanksha Pathak

votes

0 answers

Multi-class oversampling based online bagging

can anyone give me MOOB code in python Multi-class classication problems are often considered more challenging than their binary counterparts because multiple classes can increase the data complexity and aggravate the imbalanced distribution A…

multiclass-classification ensemble-learning imbalanced-data oversampling

asked May 22 '23 at 15:01

gul

votes

1 answer

Are mlr3 class weights applied to validation score calculations?

I have previously used mlr3 for imbalanced classification problems, and used PipeOpClassWeights to apply class weights to learners during training. This pipe op adds a column of observation weights to the Task, in the Task$weights property. These…

r validation classification imbalanced-data mlr3

asked May 17 '23 at 12:18

AhmetZamanis

votes

0 answers

How to customize the Dice Loss function to replace the nll_loss function in pytorch?

I assume the prediction result is pred and the corresponding label variable is label_face. Because the variable label_face contains a large amount of data imbalance in the segmentation problem. Therefore, I want to use the Dice Loss function to…

pytorch customization loss-function semantic-segmentation imbalanced-data

asked May 06 '23 at 17:09

Quoc-Duong NGUYEN

votes

0 answers

Select the maximum number of rows so that the sum of the columns is balanced

Suppose I have a table with the following columns and much more rows: Id n_positive_class1 n_positive_class2 n_positive_class3 1 0 10 4000 2 122 0 0 3 4 5234 0 I'd like to select the maximum number of rows (by Id) so that the sum of…

python pandas numpy imbalanced-data

asked May 05 '23 at 08:16

user11696358

votes

0 answers

Improving classification model (f1_score): real images vs generative images (fake)

I am working on a model to detect images. There are 2 classes, real and generative (fake). I can't get higher than 0.85 f1_score. Any recommendations how to improve the score? The data set contains 4000 real images (4000, 1200) and 2000 fake images…

generative-adversarial-network image-classification imbalanced-data generative

asked Apr 26 '23 at 15:59

Martin

votes

1 answer

Which evaluation metric will be suitable for a classification problem with an imbalanced dataset?

I have class X with 1000 observations and class Y with 2000 observations. I am trying to decide which classification evaluation metric is most appropriate here and why. Precision Recall Curve. AUC ROC Simple accuracy metric Confusion matrix and…

machine-learning classification metrics evaluation imbalanced-data

asked Apr 21 '23 at 10:45

benbitdiddle

votes

0 answers

Re-weight with WeightIt

I weighted my population using WeightIt package library(WeightIt) library(cobalt) data("lalonde", package = "cobalt") W.out <- weightit(treat ~ age + married + race, data = lalonde, estimand = "ATE", method = "ps") bal.tab(W.out,…

r weighted imbalanced-data propensity-score-matching

asked Apr 16 '23 at 19:39

user19745561

votes

0 answers

After Oversamling Smote With IsolationForest my result doesnt improve

my dataset test is 0 17565 1 2435 train is 0 70212 1 9788 I applied oversampling Smote with IsolationForest algorithm on just training set before oversampling results: F1 Score : 0.9278732648748262 Accuracy Score…

machine-learning imbalanced-data imblearn oversampling isolation-forest

asked Apr 14 '23 at 13:16

David

votes

0 answers

How to construct an imbalanced MNIST-dataset based on a pre-defined gini-coefficient?

My goal is to make different versions of the MNIST dataset with different pre-defined levels of imbalancedness. A gini-coefficient (range: 0-1) is a measure of imbalancedness of a dataset where 0 represents perfect equality and 1 represents perfect…

python mnist imbalanced-data gini

asked Apr 13 '23 at 18:42

J.G.A. Wijlhuizen

votes

0 answers

is it bad to have a high precision, recall, and fbeta on a 1:5 imbalanced dataset?

i have a research using random forest to differentiate if data is bot or human generated. the machine learning model achieved an extremely high performance accuracy, here is the result: Confusion matrix: [[420 8] [ 40 20]] Precision: …

machine-learning random-forest imbalanced-data

asked Apr 13 '23 at 17:05

das

votes

0 answers

How to generalize the SMOTE algorithm using Weka?

I am working on an imbalanced dataset, and I use SMOTE to balance the data. I build my model using Weka. I want to make (SMOTE) as a part of my model. So, when I apply my model to another imbalanced dataset, the model can increase the number of…

weka imbalanced-data smote

asked Apr 08 '23 at 11:09

Muneera

votes

0 answers

In an imbalance classification problem, is it fair to balance the test data by removing some negative examples?

I'm working on an imblance classification problem, regarding the evaluation on the test set I had the following questions Is it a fair practice to use balanced test set for evaluation ? To create balanced test set, I'm removing the negtive examples…

classification data-mining imbalanced-data

asked Mar 30 '23 at 20:06

suresh kumar A

votes

0 answers

Running LightGBM algorithm after the implementation of SMOTE function to mitigate the issue of managing imbalanced dataset

I need to run a lightGBM model with an imbalanced dataset. The dataset has a 'Target' variable with a binary result, "0" with 61471 registers and "1" with 4456 registers. To mitigate the problem of imbalance dataset I run the SMOTE Function in the…

r lightgbm imbalanced-data smote

asked Mar 27 '23 at 01:44

Guillermo Mansilla

Prev 1 2 3

…

23 24 Next