Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented).
Questions tagged [oversampling]
156 questions
2
votes
2 answers
Over-Sampling Class Imbalance Train/Test Split "Found input variables with inconsistent numbers of samples" Solution?
Trying to follow this article to perform over-sampling for imbalanced classification. My class ratio is about 8:1.
https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets/notebook
I am confused on the pipeline + coding…

thePurplePython
- 2,621
- 1
- 13
- 34
2
votes
0 answers
How do I avoid SMOTE feature name mismatch?
I am building a GBM to calculate something that is very low likelihood and my model is performing in line with random numbers with my features (i.e. badly) so I am trying to use Smote to overcome the domination of my outcomes (98.55% 0, 1.45%…

Violatic
- 374
- 2
- 18
2
votes
2 answers
How to oversample to fix class imbalance in time series data?
I have a time series with hourly frequency and a label per day. I would like to fix the class imbalance by oversampling while preserving the sequence for each one day period. Ideally I would be able to use ADASYN or another method better than random…

JHall651
- 427
- 1
- 4
- 15
2
votes
1 answer
SMOTE to balance over 200 classes in R
I have a two column dataset (feature and class) with over 200 classes to which the input features has to be classified. The occurrence of the classes ranges from 1 to few thousands for some classes. The features column has text and numbers. I tried…

chas
- 1,565
- 5
- 26
- 54
2
votes
1 answer
Will oversampling lead to an overfitted model?
The target attribute distribution is currently like this:
mydata.groupBy("Churn").count().show()
+-----+-----+
|Churn|count|
+-----+-----+
| 1| 483|
| 0| 2850|
+-----+-----+
My questions are:
methods of oversampling like: manully, smote,…

aquarian47
- 43
- 6
2
votes
0 answers
How to handle categorical variable with smotefamily in R?
I'm having some problem with smotefamily package in R.
When I'm dealing with categorical variable with the SMOTE families(SMOTE, Borderline SMOTE and others), it is not possible to generate synthetic examples cause they use the distance between…

윤성현
- 31
- 3
2
votes
2 answers
How to use over-sampled data in cross validation?
I have a imbalanced dataset. I am using SMOTE (Synthetic Minority Oversampling Technique)to perform oversampling. When performing the binary classification, I use 10-fold cross validation on this oversampled dataset.
However, I recently came accross…

J Cena
- 963
- 2
- 11
- 25
1
vote
0 answers
Data augmentation for numeric data with autoencoder
I want make new data which is based on existing data.
Existing data has a layout like example link below.
How can I make new data with autoencoder?
There are so many examples with image data, but I need to make a data with numeric(integer) data.

Nathan Song
- 11
- 1
1
vote
1 answer
Error using SMOTE TypeError: cannot safely cast non-equivalent float64 to int64
I'm preparing an unbalanced dataset and would like to use a Python package called SMOTE. When I try to run the code it shows up an error: TypeError: cannot safely cast non-equivalent float64 to int64
My dataset (first 5 rows):
Dataset
The error…

Antonio Boza
- 11
- 2
1
vote
2 answers
oversampling (SMOTE) does not work properly when fitted inside a pipeline
I have an imbalanced classification problem and I am using make_pipeline from imblearn
So the steps are the following:
kf = StratifiedKFold(n_splits=10, random_state=42, shuffle=True)
params = {
'max_depth': [2,3,5],
# …

xavi
- 80
- 1
- 12
1
vote
1 answer
oversampling some classes from time series data
For some classification needs. I have multivariate time series data composed from 4 stelite images in form of (145521 pixels, 4 dates, 2 bands)
I made a classification with tempCNN to classify the data into 5 classes. However there is a big gap…

ala
- 11
- 2
1
vote
1 answer
Is it possible to Oversample just one class out of 13?
I was wondering if it was possible to perform SMOTE or similar techniques to only one minor class. I have a text classification problem where all minor classes have good accuracies (unique words that differentiate them) except for one class where…

Yasmen
- 11
- 1
1
vote
0 answers
RandomOverSampling without Replacement
Is it possible to oversample the dataset without replacement? For RandomUnderSampling, there exist a boolean hyperparameter [replacement]; But, this hyperparameter doesn't exist in RandomOverSampling
Looking at RandomOverSampling Docs:
Object to…

Ahmad hassan
- 1,039
- 7
- 13
1
vote
1 answer
step_rose() fails in tune grid
I noted that when training with certain engines (e.g. keras and xgboost) the recipe returns more ys than Xs.
Here you'll find a minimal reproducible…

Marco Repetto
- 336
- 2
- 15
1
vote
1 answer
Four times four oversampling performance
In the process of making a rendering engine that fundamentally relies on four times four oversampling, I ran into the performance of the downscaling itself.
#include
const int_fast32_t sRGBtolinear[256] = {0, 20, 40, 60, 80, 99, 119,…

01o
- 49
- 2