Questions tagged [imbalanced-data]

Problem definition

Imbalanced data occurs in when:

  • "The user assigns more importance to the predictive performance... on a subset of the target variable domain."
  • "[T]he cases that are more relevant for the user are poorly represented in the training set."

Paula Branco, Luís Torgo, and Rita P. Ribeiro. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, Volume 49, Issue 2.

Software

Related Tags and Techniques

351 questions
0
votes
1 answer

In imbalanced datasets: the positive class is the majority class

I use Weka platform. I am working on an imbalanced dataset, and the majority class is the positive class. I aim to apply different classifiers and evaluate their performance by using several evaluation metrics including AUC. My question is: Are…
Muneera
  • 11
  • 2
0
votes
1 answer

How does sklearn calculate accuracy on the validation set when XGBoost is given class weights?

I am using XGBoost's sklearn API with sklearn's RandomizedSearchCV() to train a boosted tree model with cross validation. My problem is imbalanced, so I've supplied the scale_pos_weight parameter to my XGBClassifier. For simplicity, let's say that…
Eli
  • 280
  • 1
  • 3
  • 13
0
votes
0 answers

scale_pos_weight grid.fit mismatch

Only the clever can figure out! If somebody works out the scale_pos_weight in XGBoost, after finding out the best point, if they refit the model with that point, another figure comes up! i.e.: Best: 0.950298 using {'scale_pos_weight':…
0
votes
1 answer

How to do these in weka: cross validation + imbalanced data + feature selection

I have an imbalanced dataset (classification dataset). By using Weka platform, I want to apply these techniques: cross validation, balancing the training folds, feature selection So, I did the following (From Classify tab): I chose 10-fold…
0
votes
0 answers

Multi-class imbalance with a dataset of logical columns

I have a dataset (df) that looks like this: Item_1 Item_2 Item_3 Item_n Type TRUE FALSE FALSE TRUE Type A TRUE FALSE FALSE FALSE Type B TRUE TRUE FALSE FALSE Type_n where the number of observations per type is…
Roberto
  • 307
  • 2
  • 9
0
votes
0 answers

Error in checkMeasures(measures, learner) : object 'fbeta' not found

I am doing an imbalanced classification task, so I want to use f-beta as performance measure. I used the library(mlr) to set measures=fbeta, which follows: library(mlr) #create tasks ## Create combined training data train_data <- cbind(x_train,…
ebrahimi
  • 912
  • 2
  • 13
  • 32
0
votes
0 answers

There were missing values in resampled performance measures

I need to do a classification task on this dataset. As the following code shows, I tried to implement xgboost using caret package. Since my dataset is imbalanced, I prefer to use Fscore as performance measure. Furthermore, I need to use the first…
ebrahimi
  • 912
  • 2
  • 13
  • 32
0
votes
0 answers

Imbalanced classes in a graph classification problem

Graph Classification Problem Input Data Size: (1280,32,16) --> 32 EEG channels , 16 features for each channel Labels Size : (1280) -> 2 classes I want to classify 1280 data with features size : (1280,32,16) through a graph convolution neural network…
0
votes
0 answers

PR curve is strange

I use tab transformer network to classify a binary imbalanced dataset, after getting the probabilities, I plot the ROC and PR curve using scikit-learn, and get the figure like this. The ROC looks normal, but the PR curve is strange I'm using code…
0
votes
0 answers

Change class weights and classification threshold to deal with unbalanced dataset

i'm working on my thesis and i used a Catboost classifier to perform a binary analysis on a very unbalanced dataset: class0 = x number of samples class1 = 10*x number of samples In order to optimize the performance of the model i changed the…
0
votes
0 answers

Using CrossEntropyLoss weights with ResNet18 (Pytorch)

I'm having a a problem with using weights in my Loss function. I have a really imbalanced dataset with 7 classes, so I calculated the weight for each class and put it in a tensor. The list I Tensor'd looks like this [0.8901, 0.3295, 0.9885, 0.8887,…
0
votes
1 answer

How to use class_weight in custom knowledge distillation model train_step

I want to predict imbalanced data using knowledge distillation keras model. The y label value count is like this, y_train.value_counts() 0 9024 1 842 Name: Y_LABEL, dtype: int64` To predict imbalanced data, I tried to use class_weight, but I…
0
votes
0 answers

How do I deal with class imbalance when using Sparklyr with MLib?

I have a severe class imbalance where positive response is about 3%. The 3% absolute volume is about ~6000 rows. I'm currently using sparklyr and MLibs algorithms. Some of the native Databricks MLibs has class weight imbalance as a parameter. Is…
Choc_waffles
  • 518
  • 1
  • 4
  • 15
0
votes
0 answers

Unstable loss in binary classification for time-series data - extremely imbalanced dataset

I am working on deep learning model to detect regions of timesteps with anomalies. This model should classify each timestep as possessing the anomaly or not. My labels are something like this: labels = [0 0 0 1 0 0 0 0 1 0 0 0 ...] The 0s represent…
Kunis
  • 576
  • 7
  • 24
0
votes
0 answers

Imbalanced categorical predictors cross validation with continuous target

I am working on a project where I want to measure the predictive performance of some categorical variables on click-through rate (continuous). However, the categorical variables are highly imbalanced: packaged_goods: 796 food: 104 person:…