Questions tagged [imbalanced-data]

Problem definition

Imbalanced data occurs in when:

  • "The user assigns more importance to the predictive performance... on a subset of the target variable domain."
  • "[T]he cases that are more relevant for the user are poorly represented in the training set."

Paula Branco, Luís Torgo, and Rita P. Ribeiro. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, Volume 49, Issue 2.

Software

Related Tags and Techniques

351 questions
0
votes
1 answer

Error: Not a recognized resampling method

I'm having trouble to run my model to balancing my dataframe. It's showing me that you didn't recognize the resampling method. What do I do? > # Criando funcao controle para o treino > ctrl <- trainControl(method = "repeateadcv", + …
0
votes
0 answers

how to construct stratified tensorflow dataset?

I'm using a custom tensorflow model for an imbalanced classification problem. For this I need to split up the data in a train and test set and split the train set into batches. However the batches need to be stratified due to the imbalance problem.…
0
votes
1 answer

R Multilevel Prediction in Tidymodels with Imbalanced Nested Data

Dear All, I hope that I can inquire your expertise regarding a prediction task in R/Tidymodels. I intend to predict injuries in runners. The daily/weekly training data, on which the predictions are based on, is thereby nested in the individual…
LB.
  • 43
  • 5
0
votes
1 answer

class_weight attribute of DecisionTreeClassifier having no effect on confusion matrix, recall

I am preparing a demonstration of cost sensitive classification for a lecture and am puzzled as to why the class_weight='balanced' attribute of scikit-learn's DecisionTreeClassifier seems to be having no effect at all. The dataset has 4521…
ktreu
  • 1
0
votes
0 answers

Could an array of word vectors be handled if use imbalanced data techniques?

I have an imbalanced classification dataset of text and I used word vectors (word2vec) to embed the text. So, the result of word vector is an array. The next condition, I have variable X for word vector's array and variable Y for class/target of…
0
votes
1 answer

R: Error in model.frame.default(formula = class ~ step + type + amount + :) : object is not a matrix

I am new to R and I am trying to play around with the data from here. I try to oversampling it but the Error in model.frame.default happen. The first trial oversample_data <- ovun.sample(class ~ ., data = sample_dataset, p = 0.5, seed = 1,…
WILLIAM
  • 457
  • 5
  • 28
0
votes
1 answer

Which metric to use for imbalanced classification problem?

I am working on a classification problem with very imbalanced classes. I have 3 classes in my dataset : class 0,1 and 2. Class 0 is 11% of the training set, class 1 is 13% and class 2 is 75%. I used and random forest classifier and got 76% accuracy.…
0
votes
1 answer

A problem in using AIF360 metrics in my code

I am trying to run AI Fairness 360 metrics on skit-learn (imbalanced-learn) algorithms, but I have a problem with my code. The problem is when I apply skit-learn (imbalanced-learn) algorithms like SMOTE, it return a numpy array. While AI Fairness…
0
votes
1 answer

imbalanced classification using undersampling and oversampling using pytorch python

I want to use oversampling and under sampling techniques together I have 6 classes with number of samples as following: class 0 250000 class 1 48000 class 2 40000 class 3 38000 class 4 35000 class 5 7000 I want to use smot to make all classes…
Shorouk Adel
  • 127
  • 3
  • 20
0
votes
1 answer

Change learning rate within minibatch - keras

I have a problem with imbalanced labels, for example 90% of the data have the label 0 and the rest 10% have the label 1. I want to teach the network with minibatches. So I want the optimizer to give the examples labeled with 1 a learning rate (or…
hihi
  • 55
  • 5
0
votes
0 answers

spacy - 3.1 custom loss function and data augmentation for named entity recognition for imbalanced data

how to write a custom custom loss function for named entity recognition for imbalanced data in spacy v3 and above. My dataset contains imbalanced data for labels. For example: label a has 45000 annotations, label b has only 4000 annotations. How to…
0
votes
1 answer

Show the data which is not chosen by under sampling approach

Is there any way to get/show the data which is not chosen by undersampling approach? rus = RandomUnderSampler(sampling_strategy=1, random_state=41) x_res, y_res = rus.fit_resample(x, y)
tassaneel
  • 175
  • 2
  • 14
0
votes
1 answer

Scikit learn Stratified Shuffle Split does not work when one of the classes has just one instance

I am trying to split my dataset into a train and a test set using scikit learn's stratified shuffle split, but it does not work because one of the classes has just one instances. It would be okay if that one instance goes into either of train or…
0
votes
1 answer

Cross-validation with class imbalance

I am trying to train XGBOOST in a binary classification setting, with positive to negative instances at a 1:5 ratio. My data draws parallels to the likes of cancer detection, i.e. FNs are much more costly than FPs. After quite a bit of reading, I am…
0
votes
1 answer

how to import "balanced_batch_generator"?

I want to import " balanced_batch_generator" by below code. but it gives an error as below. I was wonder if someone helps me Error: AttributeError: module 'keras.utils' has no attribute 'Sequence' from imblearn.keras import balanced_batch_generator
afrah
  • 31
  • 8