Questions tagged [imbalanced-data]

Problem definition

Imbalanced data occurs in machine-learning when:

"The user assigns more importance to the predictive performance... on a subset of the target variable domain."

"[T]he cases that are more relevant for the user are poorly represented in the training set."

Paula Branco, Luís Torgo, and Rita P. Ribeiro. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, Volume 49, Issue 2.

Software

imbalanced-learn: imblearn

Related Tags and Techniques

SMOTE: smote (Synthetic Minority Oversampling Technique)
Resampling: resampling
Oversampling: oversampling
Downsampling: downsampling

351 questions

votes

0 answers

Over sampling with only nominal features, which over or undersampling techniques could be valid in this case?

I have data where all features are nominal. I applied SMOTE-NC, then I found that it only works with a combination of nominal and continuous features!. There is a technique called SMOTE-N (to deal with only nominal features) in the same paper of…

python machine-learning imbalanced-data oversampling smote

asked Jul 06 '20 at 14:12

Hanan

votes

1 answer

Ignore columns in SMOTE oversampling

I am having six feature columns and one target column, which is imbalanced. Can I make oversampling method like ADASYN or SMOTE by creating synthetic records only for the four columns X1,X2,X3,X4 by copying exactly the same as constant (Month, year…

machine-learning imbalanced-data smote

asked Jun 23 '20 at 14:05

Ayyasamy

votes

0 answers

Evaluating Model Outcome on Test Set After Downsampling Training Data because of Class Imbalance

I'm working with an extremely class imbalanced data set (the % of positive classes is ~0.1%) and have explored a number of different sampling techniques to help improve the model performance (measured by AUPRC). Since I only have a few thousand…

machine-learning classification imbalanced-data

asked Jun 12 '20 at 16:06

shadowprice

votes

1 answer

meaning of weighted metrics in scikit: bigger class more weight or smaller class more weight?

I am dealing with an imbalanced dataset and tried handle it with the validation metric. In scikit docu I found the following for weighted: Calculate metrics for each label, and find their average weighted by support (the number of true instances…

scikit-learn metrics imbalanced-data

asked Jun 07 '20 at 13:27

nopact

votes

1 answer

How to undersample/oversample more than two classes' dataset using "imblearn" library in Python?

I am working with "imblearn" library for undersampling. I have four classes in my dataset each having 20, 30, 40 and 50 number of data(as it is an imbalanced class). But when I try to undersample the dataset using "fit_resample(X, y)", it only…

python python-3.x python-2.7 imbalanced-data imblearn

asked May 31 '20 at 20:03

Rawnak Yazdani

1,333
2
12
23

votes

1 answer

How can I reshape (120, 100, 100) shaped image data to (120, 10000) shape to undersample using "imblearn" library of Python?

I am working with imblearn library of Python for undersampling. Necessary code: undersample = RandomUnderSampler(sampling_strategy='majority') X_under, y_under = undersample.fit_resample(X, y) Here X is my image dataset & of (120, 100, 100) shape…

python imbalanced-data imblearn

asked May 31 '20 at 17:50

Rawnak Yazdani

1,333
2
12
23

votes

1 answer

How to pass an argument to a function within a customized function?

First of all the code snippet: ## Packages from sklearn.svm import SVC from sklearn.model_selection import train_test_split from sklearn.metrics import fbeta_score from imblearn.over_sampling import RandomOverSampler from sklearn.datasets import…

python function object sampling imbalanced-data

asked May 24 '20 at 18:49

RazorLazor

votes

0 answers

R imbalance package Error in Ops.data.frame(dataset[, classAttr], minorityClass) ‘==’ only defined for equally-sized data frames

So whenever I try to use some imbalance function on my dataset I get this error: Error in Ops.data.frame(dataset[, classAttr], minorityClass) : ‘==’ only defined for equally-sized data frames This is my code: dset <-…

r imbalanced-data

asked May 23 '20 at 07:53

DolceVita34

votes

0 answers

Creation of synthetic data - Balance a dataset

I'm analyzing the Pokemon's dataset. I´d like to create a random forest to predict whether a Pokemon can be legendary or not. Right now, I have a training dataset formed by 118 observations and 44 columns: variables: $ type1_bug : int 0 0…

r machine-learning imbalanced-data smote

asked May 21 '20 at 09:52

Panri93

votes

1 answer

renormalizing class weights for imbalanced data

i have a set of imbalanced data for training on a CNN neural net. i want to calculate class weights that will be proportional to the frequency of each label, such that labels that are less frequent will be enhanced when calculating the…

machine-learning ipython classification normalization imbalanced-data

asked May 14 '20 at 08:19

ether212

votes

1 answer

Steps for a highly imbalanced classification steps. Should I up-sample & under-sample data or just up-sample the imbalanced class

I have a highly imbalanced binary (yes/no) classification dataset. The dataset currently has appx 0.008% 'yes'. I need to balance the dataset using SMOTE. I came across 2 method to deal with the imbalance. The following steps after I have run…

python-3.x imbalanced-data smote

asked May 13 '20 at 18:08

John Doe

votes

2 answers

F1 score reduced after using class weight

I am working on a multi class classification use case and the data is highly imbalanced. By highly imbalanced data I mean that there is a huge difference between class with maximum frequency and the class with minimum frequency. So if I go ahead…

python machine-learning scikit-learn classification imbalanced-data

asked May 11 '20 at 09:48

learnToCode

votes

0 answers

TypeError: 'int' object is not subscriptable (imblearn generator)

I am dealing with imbalanced text-based dataset. I used tensorflow balanced batch generator to create a balanced batch when training a model as follow: batch_generator, steps_per_epoch = balanced_batch_generator(training_x, training_y, BATCH, …

tensorflow keras imbalanced-data

asked May 06 '20 at 21:01

Elham

votes

0 answers

How do I double data for classes which have less number of images compare to other classes?

My training data is imbalanced. So I decided to resample my dataset. I want to do slightly changes while resampling. I'd like to apply a horizontal flip and Gaussian filter to minority classes to make all classes equal. To do so, I'd like to use…

image-processing deep-learning classification data-augmentation imbalanced-data

asked Apr 29 '20 at 08:21

nikki

votes

0 answers

Imbalanced text classification by oversampling: correction class probability

My dataset has 3 class and 900 examples for training. Class distribution is 220, 185, and 500. I found that if I oversample the training data then I have to correct/calibrate the predicted probability of the test data because after oversampling the…

nlp imbalanced-data

asked Apr 25 '20 at 21:16

user3363813

Prev 1 2 3

…

23 24 Next