Questions tagged [imbalanced-data]

Problem definition

Imbalanced data occurs in when:

  • "The user assigns more importance to the predictive performance... on a subset of the target variable domain."
  • "[T]he cases that are more relevant for the user are poorly represented in the training set."

Paula Branco, Luís Torgo, and Rita P. Ribeiro. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, Volume 49, Issue 2.

Software

Related Tags and Techniques

351 questions
0
votes
0 answers

Image data is highly imbalanced for CNN

I want to do binary image classification using CNN approach but my dataset is highly imbalanced such that it contains only 100 images of 0th class and 30000 images of 1st class.dataset look like…
0
votes
0 answers

How to apply SMOTE on ImageDataGenerator iterator object?

I am working on a medical image classification task and the dataset is so imbalanced that I need to apply SMOTE oversampling technique. train_image_generator = ImageDataGenerator(samplewise_center=True,…
0
votes
0 answers

imbalanced Dataset Challenges with Force-Displacement Curves: Seeking Solutions

I have a dataset consisting of force-displacement curves. The dataset is heavily imbalanced, with the negative class having 29,000 samples and the positive class having only 100 samples. After transforming the force-displacement curves with tsfresh,…
0
votes
0 answers

themis::tomek() returning 0 observations

I have an imbalanced dataset of 227846 observations and 30 columns and I would like to apply smote from the smotefamily library and then tomek from the themis library on the smote data to make the dataset balanced/near balanced. The original dataset…
0
votes
0 answers

cost sensitive learning in multiple classification in imbalanced dataset

I am working on cost sensitive learning on unbalanced datasets. I wanted to try it with the C50cost in R. But there are always examples for binary classification in the documentation. I want to experiment on a dataset with four classes. How can I…
Ayse
  • 1
  • 2
0
votes
0 answers

zero f1-score,precision and recall in one of the classe

I am working on a 5-class classification problem and the result of the benchmark by classes for the test data is as follows: As you can see, zero value has been obtained for the third class. The question is, is there anything wrong with this…
0
votes
0 answers

Can the SMOTE Method be used on Image datasets for image dataset imbalances?

Can the offset dataset handling method using the SMOTE (Synthetic Minority Oversampling Technique) method be applied to image datasets? because as far as I know SMOTE is only used for structured data, if the SMOTE method can be applied to image…
0
votes
0 answers

Trying to use XGBoost after applying SMOTENC on df, getting ValueError: A given column is not a column of the dataframe

Since my df is not balanced (per the response column) I am trying to balance the the response column and than apply XGBoost to find the best predictors from the df. After running my code, I'm getting ValueError: A given column is not a column of the…
0
votes
1 answer

Creating balanced bootstrap resamples in caret

I'm using caret to compare models for a classification problem with nested CV. Vfold in the outer loop and bootstrap (500 replicates) in the inner loop. I get this error after training knn: Warning: There were missing values in resampled performance…
0
votes
0 answers

Are Transformers (positional embedding + encoder) slow to train?

I am completely new to transformers. I built a transformer-based model that has the encoder and positional embedding parts only. I stacked 12 of them. To classify around 1 million samples of Time series data. the model is very very slow ( around…
0
votes
0 answers

How to perform RandomOverSampler with cross validation

I have an imbalance data that I want to classify it ( by random forest) using cross validation (cv=5) I want to use RandomOverSampler() to balance it how can I do it ? Then, I want to predict and get a confusion matrix and accuracy for the 5…
0
votes
1 answer

NearMiss gives this error when an argument is passed: __init__() takes 1 positional argument but 2 were given

This is the code I was using for imbalanced data to do under sampling over dataset. from collections import Counter from imblearn.under_sampling import NearMiss ns=NearMiss(0.8) X_train_ns, y_train_ns = ns.fit_resample(X_train,y_train) print("The…
0
votes
1 answer

Random Forest Classifier predicts lower proportion of positive cases compared to the actual

I am using scikit-learn Random Forest Classifier for a binary classification problem with imbalanced classes (negative class: 80%, positive class: 20%). When I apply the model on the same training dataset or test dataset the proportion of predicted…
0
votes
1 answer

Understand shap values for binary classification

I have trained my imbalanced dataset (binary classification) using CatboostClassifer. Now, I am trying to interpret the model using the SHAP library. Below is the code to fit the model and calculate shap values: weights = y.value_counts()[0] /…
Dhvani Shah
  • 351
  • 1
  • 7
  • 17
0
votes
0 answers

Imbalance dataset and Tree-base regressors

I have an imbalanced dataset and need to balance it to predict the target with tree-based regressors like DecisionTreeRegressor. To balance, I found solutions like using: Square Root Transformation Log Transformation Box-Cox…
Sara
  • 419
  • 1
  • 6
  • 14