Questions tagged [imbalanced-data]

Problem definition

Imbalanced data occurs in when:

  • "The user assigns more importance to the predictive performance... on a subset of the target variable domain."
  • "[T]he cases that are more relevant for the user are poorly represented in the training set."

Paula Branco, Luís Torgo, and Rita P. Ribeiro. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, Volume 49, Issue 2.

Software

Related Tags and Techniques

351 questions
-1
votes
0 answers

Effect of imbalanced data set on a regression problem

I want to assess the effect of imbalanced data set on a regression problem. I don't know how should I examine the effect. Which metric can help me to see effect of imbalanced data set on a regression problem? I used MSE and MAPE to compare the…
negin
  • 1
  • 1
-1
votes
1 answer

How to Handle Imbalanced Data in a Classification Problem?

I am working on a binary classification problem using machine learning, where my target classes are imbalanced. I have approximately 80% of data points in Class A and only 20% in Class B. I have tried using various classifiers like Random Forest and…
-1
votes
1 answer

Classification ML Model Training with Unbalanced Dataset

I am trying to do classification with machine learning. I have "good" and "bad" classes in my dataset. Dataset shape: (248857, 12) Due to some conditions, I am not able to collect more "good" class results, there are around 40k good, and 210k bad…
-1
votes
1 answer

Average precision score too high looking at the confusion matrix

I am developing a machine learning scikit-learn model on an imbalanced dataset (binary classification). Looking at the confusion matrix and the F1 score, I expect a lower average precision score but I almost get a perfect score and I can't figure…
-1
votes
1 answer

TypeError: fit_resample() missing 1 required positional argument: 'y'

Using imblearn for the imbalanced datasets, the parameters seems to have changed. I am using undersampling.NearMiss. Here is the code: from imblearn import under_sampling balanced = under_sampling.NearMiss() X_res, y_res =…
Vishal Rana
  • 119
  • 1
  • 7
-1
votes
1 answer

Binary Classification Problem: How to Proceed With Severe Data Imbalance?

The Problem After pre-processing a raw dataset, I obtained a clean but severely imbalanced dataset with 341 observations with label 1 and 3 observations with label 0 (more details about the dataset at the bottom). Dataset shape: (344, 1500) …
-1
votes
1 answer

Dealing with extremely imbalance with xgboost

My training data has extremely class imbalanced {0:872525,1:3335} with 100 features. I use xgboost to build classification model with Bayesian optimisation to hypertune the model in range { learning rate:(0.001,0.1), min_split_loss:(0.10), …
zonna
  • 46
  • 1
  • 9
-1
votes
1 answer

How to interpret column matrix to find best model for imbalanced dataset?

I am trying to make binary classification and My dataset is imbalanced with a 1:7 ratio. I have 1000 "1" labels and 6990 "0" labels. Predicting "1" Labels is more important than "0" but still, It should also detect "0" labels correctly as much as…
-1
votes
1 answer

Tokenization of unbalanced dataset

I'm working with a dataset of emails' content which I want to transform with doc2vec. This is a labeled dataset (spam/not-spam) and it is unbalanced (90-10 ratio). My question is: when tokenizing the emails' content, should I first oversample (using…
-1
votes
2 answers

keras imbalance data training weight adjustment

I am interested in training a regression model to predict price(numerous value) I have two data sources. One comes from 2019 and another year is 2020. 2019 has over 3times more data than 2020. I know I can do oversample to adjust this imbalance…
pict
  • 1
  • 1
-1
votes
1 answer

Handling imbalanced time series data

Having a time-series data of sensors: +----+----------+----------+------+ |day |Feature 1 |Feature 2 |target| +----+----------+----------+------+ |0 |0.2 |0.1 |0.01 | +----+----------+----------+------+ |... until day 30 I've built…
Shlomi Schwartz
  • 8,693
  • 29
  • 109
  • 186
-1
votes
2 answers

How to Classify the imbalanced Dataset using SVM

I am using the SVM, and My dataset is imbalanced. I got the result in which it classified Class 0 as 99% and Class 1 as 1%. Is there any way to correctly classify the imbalances dataset using SVM.
Anki
  • 13
  • 4
-1
votes
1 answer

Multiclass classification imbalance

I have 5 different labels, with the followings percentages of frequency: '0': 23.21% '1': 17.64% '2': 29.64% '3': 16.96% '4': 12.57% How can I evaluate if this can badly affect my predictions? I have ~1800 records with 28 features each. I…
-1
votes
1 answer

Meaning of Balanced datasets

I am researching some information about audio classification, more specifically: balanced vs. imbalanced audio datasets. So, assuming here I have two folders of two datasets' classes: Car sounds and Motorcycle sounds, car class folder has 1000 .wav…
dani
  • 21
  • 7
-1
votes
2 answers

May i know what is the correct way of handling imbalanced dataset?

I am new to DataScience, and here to clarify some doubts. I have a dataset which is imbalanced with 3 classes mainly called 1,2,3. '2' consist of majority(56.89%), '1' consist of 9.6% and '3' consist of 33.4%. May i know what is the correct…
1 2 3
23
24