Questions tagged [imbalanced-data]

Problem definition

Imbalanced data occurs in machine-learning when:

"The user assigns more importance to the predictive performance... on a subset of the target variable domain."

"[T]he cases that are more relevant for the user are poorly represented in the training set."

Paula Branco, Luís Torgo, and Rita P. Ribeiro. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, Volume 49, Issue 2.

Software

imbalanced-learn: imblearn

Related Tags and Techniques

SMOTE: smote (Synthetic Minority Oversampling Technique)
Resampling: resampling
Oversampling: oversampling
Downsampling: downsampling

351 questions

votes

0 answers

performance metrics stratified cross validation

I have implemented stratified cross validation for multiclass imbalanced dataset. Im unable to calculate the average of each performance metric such as precision, recall etc. skf = StratifiedKFold(n_splits=10) lst_accu_stratified =…

machine-learning scikit-learn cross-validation decision-tree imbalanced-data

asked Nov 22 '22 at 10:00

Nashm ia

votes

0 answers

How to find documents similar to a predefined set of documents

From big population of documents I would like to find those similar to a predefined set of documents. All documents inside the set are similar to each other, but very few documents from the population is similar to those in the set. Quite unbalanced…

machine-learning nlp cosine-similarity imbalanced-data

asked Nov 21 '22 at 23:58

user3757753

votes

0 answers

Optimal metric for training with Class-specific masked input features and imbalanced dataset

I have a classification problem of 8-classes, which are extremely imbalanced. The input dataset consists of sequences, each of length n features, where n = 19. For each of the 8 classes, I have a prior knowledge which subset of the n features are…

machine-learning neural-network imbalanced-data statistical-test

asked Nov 21 '22 at 19:26

HATEM EL-AZAB

votes

0 answers

Error: `data` and `reference` should be factors with the same levels for imbalanced class

I Used SMOTE and Tomek methods for imbalanced classes that I have. I'm trying to do boosted regression tree. It runs smoothly until I create the confusion matrix I have this error ( Error: data and reference should be factors with the same…

confusion-matrix levels imbalanced-data

asked Nov 15 '22 at 03:57

Hanan

votes

0 answers

CEM: Different Imbalance results in R and Stata

I am trying to replicate a cem matching from Stata in R. As a first step I want to evaluate the imbalance. In R I used the following code: vars <- c("X1PLTOT", "X1EBRSTOT", "X1MTHETK2", "X1RTHETK2", "X1DCCSTOT", "X1NRWABL", "female", "latino",…

r imbalanced-data propensity-score-matching causality

asked Nov 09 '22 at 10:29

Katharina

votes

0 answers

Can MRMR be used for imbalanced dataset?

I tried using MRMR on a dataset that about 10% of the dataset has class '1' and the remaining 90% has class '0'. I used the MRMR code shown below with K=10. However, I realized that after using count ifs there were more rows that each selected…

feature-selection imbalanced-data

asked Nov 09 '22 at 02:23

ABDULMUJEEB

votes

0 answers

Use of data augmentation to achive balanced dataset

The theoretical case is that we have a binary image classification task with 70% of the data being labeled A and the other 30% are labeled B. So data augmentation is generally used to avoid overfitting and get better generalization, but can I also…

classification data-augmentation imbalanced-data

asked Nov 02 '22 at 15:34

Dom R.

votes

0 answers

Is there a cost-sensitive loss function implementation in PyTorch?

I would like to implement a cost-sensitive loss function in PyTorch. My two-class training dataset is heavily imbalanced, where 75% of the data are label '0' and only 25% of the data are label '1'. I am new to PyTorch but my supervisor is adamant…

python conv-neural-network loss-function imbalanced-data

asked Oct 31 '22 at 23:13

Horacio_Screen

votes

1 answer

Stratified sampling for semantic segmentation

I have a set of images and multi-label masks (an image usually has segments of more than one class) and I would like to split it into train and validation sets. The data is imbalanced, where two of the classes appear in about 1% of the images and…

statistics sampling multilabel-classification semantic-segmentation imbalanced-data

asked Oct 26 '22 at 03:34

ayabp

votes

1 answer

stratify sklearn train_test_split using dummy vector for 'stratify parameter

I want to split my data into train, val, and test sets, using the stratify parameter in the train_test_split library. I want to use a binary dummy vector (the vector name is prop) for the stratify parameter, making the test's labels proportion the…

python scikit-learn training-data imbalanced-data test-data

asked Oct 24 '22 at 16:16

Amit S

votes

0 answers

ROSE() in R giving me negative samples when all values in training set are positive integers

I am oversampling my training dataset using ROSE() in R as below, and the oversampled dataset contains several negative values for columns that are meant to be strictly positive. The original training data is also positive, so I am surprised that…

r imbalanced-data oversampling

asked Oct 21 '22 at 22:10

DV24

votes

0 answers

MiniBatches there are no samples for class label exception

I was following the first example given in Accord.Net framework's documentation here to train a multi class SVM classifier with my own dataset but during the training loop the I got an error that says: There are no samples for class label 3. Please…

classification svm imbalanced-data accord.net mini-batch

asked Oct 20 '22 at 11:29

Fatih Barmanbay

votes

0 answers

Imbalanced data: precision and recall when the minority is negative case instead of positive case

I have an imbalanced dataset where 90% of cases having Y = 1, and 10% of cases having Y = 0. In this case, do precision and recall still apply? Because precision and recall focus on true positive (TP), which is not the case in my dataset. In my…

precision-recall imbalanced-data

asked Oct 20 '22 at 04:38

ycenycute

votes

1 answer

High AUC and 100% recall, but precision and F1 are low

I have an imbalanced dataset which has 43323 rows and 9 of them belong to 'failure' class, other rows belong to 'normal' class. I trained a classifier with 100% recall and 94.89% AUC for test data (0.75/0.25 split with stratify = y). However, the…

machine-learning precision roc precision-recall imbalanced-data

asked Oct 19 '22 at 10:41

ERIC_STAR

votes

1 answer

How to process "strong" imbalaced data for multi-label image classification with transfer learning

I tried myself but couldn't reach the final point that's why posting here, please guide me. I'm working in multi-label image classification and have slightly different scenarios. I have a big and significant imbalance dataset. You can see the…

python keras deep-learning multilabel-classification imbalanced-data

asked Oct 17 '22 at 04:08

NHT_99

Prev 1 2 3

…

23 24 Next