Questions tagged [oversampling]

Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented).

156 questions
2
votes
2 answers

Over-Sampling Class Imbalance Train/Test Split "Found input variables with inconsistent numbers of samples" Solution?

Trying to follow this article to perform over-sampling for imbalanced classification. My class ratio is about 8:1. https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets/notebook I am confused on the pipeline + coding…
thePurplePython
  • 2,621
  • 1
  • 13
  • 34
2
votes
0 answers

How do I avoid SMOTE feature name mismatch?

I am building a GBM to calculate something that is very low likelihood and my model is performing in line with random numbers with my features (i.e. badly) so I am trying to use Smote to overcome the domination of my outcomes (98.55% 0, 1.45%…
Violatic
  • 374
  • 2
  • 18
2
votes
2 answers

How to oversample to fix class imbalance in time series data?

I have a time series with hourly frequency and a label per day. I would like to fix the class imbalance by oversampling while preserving the sequence for each one day period. Ideally I would be able to use ADASYN or another method better than random…
JHall651
  • 427
  • 1
  • 4
  • 15
2
votes
1 answer

SMOTE to balance over 200 classes in R

I have a two column dataset (feature and class) with over 200 classes to which the input features has to be classified. The occurrence of the classes ranges from 1 to few thousands for some classes. The features column has text and numbers. I tried…
chas
  • 1,565
  • 5
  • 26
  • 54
2
votes
1 answer

Will oversampling lead to an overfitted model?

The target attribute distribution is currently like this: mydata.groupBy("Churn").count().show() +-----+-----+ |Churn|count| +-----+-----+ | 1| 483| | 0| 2850| +-----+-----+ My questions are: methods of oversampling like: manully, smote,…
aquarian47
  • 43
  • 6
2
votes
0 answers

How to handle categorical variable with smotefamily in R?

I'm having some problem with smotefamily package in R. When I'm dealing with categorical variable with the SMOTE families(SMOTE, Borderline SMOTE and others), it is not possible to generate synthetic examples cause they use the distance between…
윤성현
  • 31
  • 3
2
votes
2 answers

How to use over-sampled data in cross validation?

I have a imbalanced dataset. I am using SMOTE (Synthetic Minority Oversampling Technique)to perform oversampling. When performing the binary classification, I use 10-fold cross validation on this oversampled dataset. However, I recently came accross…
J Cena
  • 963
  • 2
  • 11
  • 25
1
vote
0 answers

Data augmentation for numeric data with autoencoder

I want make new data which is based on existing data. Existing data has a layout like example link below. How can I make new data with autoencoder? There are so many examples with image data, but I need to make a data with numeric(integer) data.
1
vote
1 answer

Error using SMOTE TypeError: cannot safely cast non-equivalent float64 to int64

I'm preparing an unbalanced dataset and would like to use a Python package called SMOTE. When I try to run the code it shows up an error: TypeError: cannot safely cast non-equivalent float64 to int64 My dataset (first 5 rows): Dataset The error…
1
vote
2 answers

oversampling (SMOTE) does not work properly when fitted inside a pipeline

I have an imbalanced classification problem and I am using make_pipeline from imblearn So the steps are the following: kf = StratifiedKFold(n_splits=10, random_state=42, shuffle=True) params = { 'max_depth': [2,3,5], # …
1
vote
1 answer

oversampling some classes from time series data

For some classification needs. I have multivariate time series data composed from 4 stelite images in form of (145521 pixels, 4 dates, 2 bands) I made a classification with tempCNN to classify the data into 5 classes. However there is a big gap…
ala
  • 11
  • 2
1
vote
1 answer

Is it possible to Oversample just one class out of 13?

I was wondering if it was possible to perform SMOTE or similar techniques to only one minor class. I have a text classification problem where all minor classes have good accuracies (unique words that differentiate them) except for one class where…
Yasmen
  • 11
  • 1
1
vote
0 answers

RandomOverSampling without Replacement

Is it possible to oversample the dataset without replacement? For RandomUnderSampling, there exist a boolean hyperparameter [replacement]; But, this hyperparameter doesn't exist in RandomOverSampling Looking at RandomOverSampling Docs: Object to…
1
vote
1 answer

step_rose() fails in tune grid

I noted that when training with certain engines (e.g. keras and xgboost) the recipe returns more ys than Xs. Here you'll find a minimal reproducible…
Marco Repetto
  • 336
  • 2
  • 15
1
vote
1 answer

Four times four oversampling performance

In the process of making a rendering engine that fundamentally relies on four times four oversampling, I ran into the performance of the downscaling itself. #include const int_fast32_t sRGBtolinear[256] = {0, 20, 40, 60, 80, 99, 119,…
01o
  • 49
  • 2
1 2
3
10 11