Questions tagged [imbalanced-data]

Problem definition

Imbalanced data occurs in machine-learning when:

"The user assigns more importance to the predictive performance... on a subset of the target variable domain."

"[T]he cases that are more relevant for the user are poorly represented in the training set."

Paula Branco, Luís Torgo, and Rita P. Ribeiro. (2016) A Survey of Predictive Modeling on Imbalanced Domains. ACM Computing Surveys, Volume 49, Issue 2.

Software

imbalanced-learn: imblearn

Related Tags and Techniques

SMOTE: smote (Synthetic Minority Oversampling Technique)
Resampling: resampling
Oversampling: oversampling
Downsampling: downsampling

351 questions

votes

1 answer

How to handle imbalanced dataset for CheXpert data on a classification problem from radiography images

I am working on an image classification problem using CNN and DNNs to be more specific. But the data at hand is highly imbalanced and hence giving highly skewed results. It is predicting everything as true or everything as false. I have tried the…

keras deep-learning conv-neural-network imbalanced-data

asked Nov 21 '19 at 04:01

Aditya Bhattacharya

votes

1 answer

Problem with Over- and Under-Sampling with ROSE in R

I have a dataset to classify between won cases (14399) and lost cases (8677). The dataset has 912 predicting variables. I am trying to oversample the lost cases in order to reach almost the same number as the won cases (so having 14399 cases for…

r machine-learning oversampling imbalanced-data

asked Nov 08 '19 at 12:48

user8383689

votes

1 answer

Imbalance in multi class classification problem - four target levels

I am having imbalance in my data as shown below, Whenever I have tried with ADASYN it shows error, Do we need to provide any parameter entry for the same ? Some time it runs for long time but no response even after 40 minutes of code run. …

python-3.x machine-learning classification imbalanced-data smote

asked Nov 05 '19 at 04:00

Ayyasamy

votes

0 answers

I have an error in missing values are not allowed in subscripted assignments of data frames

I am new to R and I am constructing R codes for my personal project/exercise. The data I am using is about a survey on ethnic identity of people from Hongkong. I used 2019 data from http://data.hkupop.hku.hk/v3/hkupop/ethnic_identity/ch.html. After…

r imbalanced-data smote

asked Oct 25 '19 at 09:17

Hyelim_kim1028

votes

2 answers

How to deal with rasa nlu data imbalance problem？

Now I have 12 intents to identify，But the amount of data for each intent is inconsistent，Like meeting settings, reminding these intentions, the amount of data will be thousands.But like greetings, thank you for such an intention, there are very few…

embedding rasa-nlu rasa-core rasa imbalanced-data

asked Oct 19 '19 at 10:12

shaojie

votes

1 answer

For an imbalanced dataset, is it better to use oversampling or undersampling techniques?

I have a binary classification problem where the dataset is imbalanced, I don't know what to use between undersampling and oversampling!!

machine-learning classification data-science imbalanced-data

asked Oct 18 '19 at 10:04

Sarray Thamer

votes

0 answers

Is there a more efficient way to oversample data than random.sample()?

I got a big unbalanced classification problem and want to address this issue by oversampling the minor classes. (N(class 1) = 8,5mio, N(class n) = 3000) For that purpose I want to get 100.000 sample for each of the n classes by data_oversampled =…

random classification oversampling imbalanced-data

asked Oct 10 '19 at 18:59

Quastiat

1,164
1
18
37

votes

2 answers

Passing a list as loss_weights, it should have one entry per model output. Keras tells me that the model has 1 output but I thought having more

I have a dataset df for a multiclass classification problem. I have a huge class imbalance. Namely, grade_F and grade_G. >>> percentage = 1. / df['grade'].value_counts(normalize=True) >>> print(percentage ) B 0.295436 C 0.295362 A …

python python-3.x keras multilabel-classification imbalanced-data

asked Sep 17 '19 at 17:26

Revolucion for Monica

2,848
8
39
78

votes

1 answer

Deep Learning with Small Datasets and SMOTE

I have a data with 6000 records. I am having a train, validate and test set of 60-20-20. I am getting an accuracy of around 76% with XGboost. I converted my data into Time series and I apply LSTM/1-D Convnets and the accuracy is around 60%. Is my…

machine-learning deep-learning time-series imbalanced-data smote

asked Sep 03 '19 at 17:54

Usman Malik

votes

1 answer

Why we use the loss to update our model but use the metrics to choose the model we need?

First of all,I am confused about why we use the loss to update the model but use the metrics to choose the model we need. Maybe not all of code, but most of the code I've seen does,they use EarlyStopping to monitor the metrics on the validation…

python machine-learning keras imbalanced-data

asked Sep 02 '19 at 11:49

JALS

votes

2 answers

Multi-features modeling based on one binary-feature which is rarely 1 (imbalanced data) when there is a cost

I need to model a multi-variate time-series data to predict a binary-target which is rarely 1 (imbalanced data). This means that we want to model based on one feature is binary (outbreak), rarely 1? All of the features are binary and rarely 1. What…

python-3.x feature-selection multiclass-classification imbalanced-data

asked Aug 21 '19 at 00:32

user10296606

votes

1 answer

How to take a more balanced sample data Python

I have a dataframe with nomalized percentage info. Eg. wordCount number Percent 2.0 1282 0.267345 1.0 888 0.185213 3.0 1124 0.170791 4.0 1250 0.152877 5.0 554 0.084864 6.0 333 0.058904 7.0 …

python gamma-distribution gamma imbalanced-data

asked Aug 19 '19 at 03:32

Jennifer

votes

1 answer

Retrieve the indices for only the resampled instances after oversampling using imbalanced-learn?

For a binary text classification problem with imbalanced data, I use imbalanced-learn library's function RandomOverSampler to balance the classes. Now, I want to retrieve only the instances that were oversampled (replicated) from the original data.…

nlp text-classification indices oversampling imbalanced-data

asked Aug 12 '19 at 15:41

PinkBanter

1,686
5
17
38

votes

2 answers

NaNs with customised weighted F1-Score in Keras

I need to compute a weighted F1-score in such a way to penalize more errors over my least popular label (typical binary classification problem with an unbalanced dataset). Unfortunately, I don't get a valid F1-score. The followings are my metrics…

keras metrics imbalanced-data

asked May 02 '18 at 16:11

Guido

-1

votes

0 answers

“DataConversionWarning” when Training Logistic Regression Model with Unbalanced Data

X, y = make_classification(n_samples=1000, n_features=10, weights=[0.9, 0.1], random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) model = LogisticRegression() model.fit(X_train,…

python scikit-learn logistic-regression imbalanced-data

asked Aug 30 '23 at 08:39

julia mathews

Prev 1 2 3

…

23 24 Next