Questions tagged [imbalanced-data]

Problem definition

Imbalanced data occurs in machine-learning when:

"The user assigns more importance to the predictive performance... on a subset of the target variable domain."

"[T]he cases that are more relevant for the user are poorly represented in the training set."

Paula Branco, Luís Torgo, and Rita P. Ribeiro. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, Volume 49, Issue 2.

Software

imbalanced-learn: imblearn

Related Tags and Techniques

SMOTE: smote (Synthetic Minority Oversampling Technique)
Resampling: resampling
Oversampling: oversampling
Downsampling: downsampling

351 questions

votes

1 answer

In imbalanced datasets: the positive class is the majority class

I use Weka platform. I am working on an imbalanced dataset, and the majority class is the positive class. I aim to apply different classifiers and evaluate their performance by using several evaluation metrics including AUC. My question is: Are…

weka metrics evaluation auc imbalanced-data

asked Jan 08 '23 at 16:22

Muneera

votes

1 answer

How does sklearn calculate accuracy on the validation set when XGBoost is given class weights?

I am using XGBoost's sklearn API with sklearn's RandomizedSearchCV() to train a boosted tree model with cross validation. My problem is imbalanced, so I've supplied the scale_pos_weight parameter to my XGBClassifier. For simplicity, let's say that…

python scikit-learn xgboost imbalanced-data xgbclassifier

asked Jan 05 '23 at 15:42

Eli

votes

0 answers

scale_pos_weight grid.fit mismatch

Only the clever can figure out! If somebody works out the scale_pos_weight in XGBoost, after finding out the best point, if they refit the model with that point, another figure comes up! i.e.: Best: 0.950298 using {'scale_pos_weight':…

imbalanced-data xgbclassifier

asked Dec 23 '22 at 16:09

Reza Paradise

votes

1 answer

How to do these in weka: cross validation + imbalanced data + feature selection

I have an imbalanced dataset (classification dataset). By using Weka platform, I want to apply these techniques: cross validation, balancing the training folds, feature selection So, I did the following (From Classify tab): I chose 10-fold…

machine-learning weka cross-validation feature-selection imbalanced-data

asked Dec 23 '22 at 09:22

Muneera

votes

0 answers

Multi-class imbalance with a dataset of logical columns

I have a dataset (df) that looks like this: Item_1 Item_2 Item_3 Item_n Type TRUE FALSE FALSE TRUE Type A TRUE FALSE FALSE FALSE Type B TRUE TRUE FALSE FALSE Type_n where the number of observations per type is…

r imbalanced-data smote

asked Dec 17 '22 at 22:32

Roberto

votes

0 answers

Error in checkMeasures(measures, learner) : object 'fbeta' not found

I am doing an imbalanced classification task, so I want to use f-beta as performance measure. I used the library(mlr) to set measures=fbeta, which follows: library(mlr) #create tasks ## Create combined training data train_data <- cbind(x_train,…

r classification xgboost mlr imbalanced-data

asked Dec 17 '22 at 06:03

ebrahimi

votes

0 answers

There were missing values in resampled performance measures

I need to do a classification task on this dataset. As the following code shows, I tried to implement xgboost using caret package. Since my dataset is imbalanced, I prefer to use Fscore as performance measure. Furthermore, I need to use the first…

r classification xgboost r-caret imbalanced-data

asked Dec 16 '22 at 03:44

ebrahimi

votes

0 answers

Imbalanced classes in a graph classification problem

Graph Classification Problem Input Data Size: (1280,32,16) --> 32 EEG channels , 16 features for each channel Labels Size : (1280) -> 2 classes I want to classify 1280 data with features size : (1280,32,16) through a graph convolution neural network…

python graph classification imbalanced-data eeglab

asked Dec 15 '22 at 17:05

Peter Lion

votes

0 answers

PR curve is strange

I use tab transformer network to classify a binary imbalanced dataset, after getting the probabilities, I plot the ROC and PR curve using scikit-learn, and get the figure like this. The ROC looks normal, but the PR curve is strange I'm using code…

python scikit-learn precision-recall imbalanced-data

asked Dec 12 '22 at 01:55

user18391472

votes

0 answers

Change class weights and classification threshold to deal with unbalanced dataset

i'm working on my thesis and i used a Catboost classifier to perform a binary analysis on a very unbalanced dataset: class0 = x number of samples class1 = 10*x number of samples In order to optimize the performance of the model i changed the…

classification catboost imbalanced-data

asked Dec 09 '22 at 14:42

Francesco De Santis

votes

0 answers

Using CrossEntropyLoss weights with ResNet18 (Pytorch)

I'm having a a problem with using weights in my Loss function. I have a really imbalanced dataset with 7 classes, so I calculated the weight for each class and put it in a tensor. The list I Tensor'd looks like this [0.8901, 0.3295, 0.9885, 0.8887,…

python pytorch conv-neural-network resnet imbalanced-data

asked Dec 06 '22 at 11:54

philippe

votes

1 answer

How to use class_weight in custom knowledge distillation model train_step

I want to predict imbalanced data using knowledge distillation keras model. The y label value count is like this, y_train.value_counts() 0 9024 1 842 Name: Y_LABEL, dtype: int64` To predict imbalanced data, I tried to use class_weight, but I…

python keras training-data imbalanced-data

asked Dec 05 '22 at 04:39

Jaeyoung Park

votes

0 answers

How do I deal with class imbalance when using Sparklyr with MLib?

I have a severe class imbalance where positive response is about 3%. The 3% absolute volume is about ~6000 rows. I'm currently using sparklyr and MLibs algorithms. Some of the native Databricks MLibs has class weight imbalance as a parameter. Is…

apache-spark-sql sparklyr imbalanced-data

asked Dec 05 '22 at 01:25

Choc_waffles

votes

0 answers

Unstable loss in binary classification for time-series data - extremely imbalanced dataset

I am working on deep learning model to detect regions of timesteps with anomalies. This model should classify each timestep as possessing the anomaly or not. My labels are something like this: labels = [0 0 0 1 0 0 0 0 1 0 0 0 ...] The 0s represent…

tensorflow keras time-series lstm imbalanced-data

asked Nov 28 '22 at 01:22

Kunis

votes

0 answers

Imbalanced categorical predictors cross validation with continuous target

I am working on a project where I want to measure the predictive performance of some categorical variables on click-through rate (continuous). However, the categorical variables are highly imbalanced: packaged_goods: 796 food: 104 person:…

regression cross-validation train-test-split imbalanced-data k-fold

asked Nov 24 '22 at 20:28

donhendriko

Prev 1 2 3

…

23 24 Next