Questions tagged [imbalanced-data]

Problem definition

Imbalanced data occurs in when:

  • "The user assigns more importance to the predictive performance... on a subset of the target variable domain."
  • "[T]he cases that are more relevant for the user are poorly represented in the training set."

Paula Branco, Luís Torgo, and Rita P. Ribeiro. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, Volume 49, Issue 2.

Software

Related Tags and Techniques

351 questions
0
votes
0 answers

Implementing dynamic radius to radius neighbor classifier for better class imbalance handling

I am trying to create dynamic radius based radius neighbour classifier for one multiclass classification problem. This dataset is havig 7 classses. I am giving different radius to each class and then passing it to radius neighbour classifier. My…
0
votes
0 answers

class weights using customized generator function keras

I have written the following generator function. In order to use class weights, with this generator, I receive an error when I use the following commands training_generator=image_generator(partition['train'], labels, bat_siz ) counter =…
Akanksha Pathak
  • 161
  • 1
  • 5
0
votes
0 answers

Multi-class oversampling based online bagging

can anyone give me MOOB code in python Multi-class classication problems are often considered more challenging than their binary counterparts because multiple classes can increase the data complexity and aggravate the imbalanced distribution A…
0
votes
1 answer

Are mlr3 class weights applied to validation score calculations?

I have previously used mlr3 for imbalanced classification problems, and used PipeOpClassWeights to apply class weights to learners during training. This pipe op adds a column of observation weights to the Task, in the Task$weights property. These…
0
votes
0 answers

How to customize the Dice Loss function to replace the nll_loss function in pytorch?

I assume the prediction result is pred and the corresponding label variable is label_face. Because the variable label_face contains a large amount of data imbalance in the segmentation problem. Therefore, I want to use the Dice Loss function to…
0
votes
0 answers

Select the maximum number of rows so that the sum of the columns is balanced

Suppose I have a table with the following columns and much more rows: Id n_positive_class1 n_positive_class2 n_positive_class3 1 0 10 4000 2 122 0 0 3 4 5234 0 I'd like to select the maximum number of rows (by Id) so that the sum of…
user11696358
  • 356
  • 1
  • 15
0
votes
0 answers

Improving classification model (f1_score): real images vs generative images (fake)

I am working on a model to detect images. There are 2 classes, real and generative (fake). I can't get higher than 0.85 f1_score. Any recommendations how to improve the score? The data set contains 4000 real images (4000, 1200) and 2000 fake images…
0
votes
1 answer

Which evaluation metric will be suitable for a classification problem with an imbalanced dataset?

I have class X with 1000 observations and class Y with 2000 observations. I am trying to decide which classification evaluation metric is most appropriate here and why. Precision Recall Curve. AUC ROC Simple accuracy metric Confusion matrix and…
0
votes
0 answers

Re-weight with WeightIt

I weighted my population using WeightIt package library(WeightIt) library(cobalt) data("lalonde", package = "cobalt") W.out <- weightit(treat ~ age + married + race, data = lalonde, estimand = "ATE", method = "ps") bal.tab(W.out,…
0
votes
0 answers

After Oversamling Smote With IsolationForest my result doesnt improve

my dataset test is 0 17565 1 2435 train is 0 70212 1 9788 I applied oversampling Smote with IsolationForest algorithm on just training set before oversampling results: F1 Score : 0.9278732648748262 Accuracy Score…
0
votes
0 answers

How to construct an imbalanced MNIST-dataset based on a pre-defined gini-coefficient?

My goal is to make different versions of the MNIST dataset with different pre-defined levels of imbalancedness. A gini-coefficient (range: 0-1) is a measure of imbalancedness of a dataset where 0 represents perfect equality and 1 represents perfect…
0
votes
0 answers

is it bad to have a high precision, recall, and fbeta on a 1:5 imbalanced dataset?

i have a research using random forest to differentiate if data is bot or human generated. the machine learning model achieved an extremely high performance accuracy, here is the result: Confusion matrix: [[420 8] [ 40 20]] Precision: …
das
  • 19
  • 1
0
votes
0 answers

How to generalize the SMOTE algorithm using Weka?

I am working on an imbalanced dataset, and I use SMOTE to balance the data. I build my model using Weka. I want to make (SMOTE) as a part of my model. So, when I apply my model to another imbalanced dataset, the model can increase the number of…
Muneera
  • 11
  • 2
0
votes
0 answers

In an imbalance classification problem, is it fair to balance the test data by removing some negative examples?

I'm working on an imblance classification problem, regarding the evaluation on the test set I had the following questions Is it a fair practice to use balanced test set for evaluation ? To create balanced test set, I'm removing the negtive examples…
0
votes
0 answers

Running LightGBM algorithm after the implementation of SMOTE function to mitigate the issue of managing imbalanced dataset

I need to run a lightGBM model with an imbalanced dataset. The dataset has a 'Target' variable with a binary result, "0" with 61471 registers and "1" with 4456 registers. To mitigate the problem of imbalance dataset I run the SMOTE Function in the…