Questions tagged [imbalanced-data]

Problem definition

Imbalanced data occurs in when:

  • "The user assigns more importance to the predictive performance... on a subset of the target variable domain."
  • "[T]he cases that are more relevant for the user are poorly represented in the training set."

Paula Branco, Luís Torgo, and Rita P. Ribeiro. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, Volume 49, Issue 2.

Software

Related Tags and Techniques

351 questions
3
votes
1 answer

Stratified train-test splitting a Tensorflow dataset

I am currently working with a quite large image-dataset and I loaded it using ImageDataGenerator from tensorflow.keras in python. As the classification of my data is very imbalanced I wanted to do a stratified train-test-split to possibly achieve a…
3
votes
1 answer

Multi Label Imbalanced dataset classification

I am currently working on an multi label fashion item dataset which is highly imbalanced I tried using class_weights to tackle it, but still the accuracy is stuck at 0.7556 every epoch. Is there any way, I can avoid this problem. Did I implement the…
3
votes
1 answer

How do I know the order of the classes in a CatBoost classifier weights?

This is a pretty dumb question, but I couldn't find anywhere, so I will take my chances in here... I'm building a classifier using CatBoost. Since this is a NLP problem, my features are the words/tokens in the tweet and the target is the…
Yuxxxxxx
  • 203
  • 1
  • 5
3
votes
7 answers

Cannot import name 'available_if' from 'sklearn.utils.metaestimators'

While importing "from imblearn.over_sampling import SMOTE", getting import error. Please check and help. I tried upgrading sklearn, but the upgrade was undone with 'OSError'. Firsty installed imbalance-learn through pip. !pip install -U…
Piyush
  • 31
  • 1
  • 1
  • 2
3
votes
1 answer

Specifying class or sample weights in Keras for one-hot encoded labels in a TF Dataset

I am trying to train an image classifier on an unbalanced training set. In order to cope with the class imbalance, I want either to weight the classes or the individual samples. Weighting the classes does not seem to work. And somehow for my setup I…
3
votes
1 answer

Calculating micro F-1 score in keras

I have a dataset with 15 imbalanced classes and trying to do multilabel classification with keras. I am trying to use micro F-1 score as a metric. My model: # Create a VGG instance model_vgg = tf.keras.applications.VGG19(weights = 'imagenet',…
Anakin Skywalker
  • 2,400
  • 5
  • 35
  • 63
3
votes
1 answer

Dealing with class imbalance with mlr3

Lately I have been advised to change machine learning framework to mlr3. But I am finding transition somewhat more difficult than I thought at the beginning. In my current project I am dealing with highly imbalanced data which I would like to…
Radbys
  • 400
  • 2
  • 10
3
votes
3 answers

Imbalanced Image Dataset (Tensorflow2)

I'm trying to do a binary image classification problem, but the two classes (~590 and ~5900 instances, for class 1 and 2, respectively) are heavily skewed, but still quite distinct. Is there any way I can fix this, I want to try SMOTE/random…
3
votes
2 answers

Why the value of precision and recall is almost the same as precision and recall of the underrepresented class

I have binary classification in which one of the classes is almost 0.1 size of the other class. I am using sklearn to create a model and evaluate it. I am using these two functions: print(precision_recall_fscore_support(y_real,y_pred)) out:…
3
votes
1 answer

Is it okay to build a model on imbalanced data?

Background - The dataset I am working on is highly imbalanced and the number of classes is 543. The data is bounded by date. After exploring the data over a span of 5 years I came to know the imbalance is inherent and its persistent. The test data…
learnToCode
  • 341
  • 4
  • 14
3
votes
0 answers

multi-label classification and stratified sampling with different preparation of target value will get different result

I have a dataset that looks like this: Clean_Tweet cEXT cNEU cAGR cCON cOPN 0 thanks questions watch season premiere tonight 0 1 1 1 0 1 couple films…
3
votes
0 answers

Combating class imbalance with the right loss function: IoU, Dice or 2-class Dice?

I am currently working on my Bachelor's thesis and facing some difficulties while trying to understand differences in loss functions regarding class imbalance, and class imbalance itself. I am working on a Segmentation Problem with a variation of…
3
votes
2 answers

Process for oversampling data for imbalanced binary classification

I have about a 30% and 70% for class 0 (minority class) and class 1 (majority class). Since I do not have a lot of data, I am planning to oversample the minority class to balance out the classes to become a 50-50 split. I was wondering if…
2
votes
3 answers

How to deal with very imbalanced classes when doing NLP classification?

I'm working on a NLP classification problem and I noticed that there is a huge disparities between classes. I'm working with a dataset with 44k~ observations with 99 labels. Out of those 99 labels, only 21 have more than 500 observations and some…
wageeh
  • 13
  • 1
  • 5
  • 18
2
votes
0 answers

How to implement undersampling techniques like NearMiss, TomekLinks, ClusterCentroids, ENN using PySpark?

I'm trying to work on a Fraud Detection dataset from kaggle Credit Card Transactions Fraud Detection Dataset I'm working on PySpark and wish to apply Undersampling techniques using PySpark. However, I can't find any articles or documentations that…
1
2
3
23 24