Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented).
Questions tagged [oversampling]
156 questions
0
votes
1 answer
Oversampled train set and test set - machine learning classification
Let's say that I have oversampled my training set after splitting, then I selected the features of interest to be extracted based on the training set analysis.
After this, do I use the oversampled training set with the testing set together to…

Louise
- 83
- 5
0
votes
0 answers
i want to apply oversampling to the minority classes.but it displayes an error code
I have a datasets which is imbalanced between normal and abnormal ultrasound liver images.I want to balance the datasets using imageDatagenerator packages but it displays an error
TypeError: init() got an unexpected keyword argument 'oversample'
…

hume
- 1
- 1
0
votes
0 answers
PyAudio Oversampling on Windows 10
I have written a program in Python which reads a stream of data from an external sound card. From my mac computer I can connect to the sound card at 44.1, 48, 96, 192 & 384 kHz without any issues, however on the Windows platform it is only possible…

Steven Sesselmann
- 21
- 6
0
votes
0 answers
How do inputs to UBL::SmoteClassif() influence vectors lengths passed to Fortran?
I'm using UBL::SmoteClassif() function in R to over-sample minority classes to create a more balanced dataset. I have 8 classes. I had a dataset with 357,038 rows and 147 columns/covariates and it works. I have another dataset with 186,274 rows and…

Kevin
- 229
- 3
- 9
0
votes
0 answers
Can't oversample my image data using SMOTE
I'm new to machine learning, and i have been working on a project for early dementia detection using cnn.
I am facing issue in oversampling my data.(data is MRI images from imported from kaggle with train and test classes having 4 sub…

manavmalaviya
- 1
- 2
0
votes
0 answers
Process oversampled distribution data to normal distribution data?
First, sorry for my poor English skills.
I have data to preprocess, and some columns' distributions are like below figure.
My first opinion is down sample those peak area so that data can have normal-like distribution.
my question is,
if…

hjsg1010
- 165
- 3
- 13
0
votes
0 answers
How to define "sampling_strategy" in SMOTE and RandomUnderSampling for Multiclass Classification problem?
I am solving a multiclass classification problem using LinearSVC() where each class has the following samples (training data)
Counter({7: 4799, 6: 4713, 4: 4448, 3: 419, 2: 405, 5: 324, 0: 214, 1: 64})
I tried both oversampling using SMOTE and…

Vedant Sharma
- 31
- 3
0
votes
0 answers
SMOTE runs forever with no result (small dataset)
I'm trying to use SMOTE or SMONEENN on my dataset containing only 2000 rows. My set up is the following:
sklearn: '1.1.3'
imblearn: '0.9.1'
python: '3.10.8'
when running
smt = SMOTEENN(random_state=42, n_jobs = -1, )
and
X_train_SMOTE, y_train_SMOTE…

joe.somewhere
- 3
- 2
0
votes
0 answers
Oversampling method (SMOTE, ADASYN, and borderline SMOTE)
mates.
I was supposed to try to deal this data after oversampling, so I chose SMOTE, ADASYN and borderlinesmote to figure out which sampling method is the best.
but the thing is .. when I applied those three sampler, seems they are creating exact…

Nini
- 25
- 3
0
votes
1 answer
Combination of CalibratedClassifierCV with RandomOverSampler
When using a classifier like GaussianNB(), the resulting .predict_proba() values are sometimes poorly calibrated; that's why I'd like to wrap this classifier into sklearn's CalibratedClassifierCV.
I have now a binary classification problem with only…

Requin
- 467
- 4
- 16
0
votes
0 answers
BERT - The truth value of a DataFrame is ambiguous
I am getting into deep learning for some of my models and I am running into issues. I wanted to get it to work simply without any adjustments in the data, but I got
Graph execution error:
followed by a bunch of lines like
File…

Rob
- 21
- 5
0
votes
0 answers
ROSE() in R giving me negative samples when all values in training set are positive integers
I am oversampling my training dataset using ROSE() in R as below, and the oversampled dataset contains several negative values for columns that are meant to be strictly positive. The original training data is also positive, so I am surprised that…

DV24
- 1
0
votes
0 answers
Different training score but same test score when using pipeline
I have a problem that produce different training score when using pipeline and manual.
MANUAL :
#standardize data
sc=StandardScaler()
X_train[['age','balance','duration']] =…

new_data
- 11
- 2
0
votes
0 answers
Fit the model using entire data or from training data?
I am given two data.
Firstly, the train data with known class (target)
Secondly, the test data with no class (no target)
I split the training data into train set and validation set .
I oversample the train data and test it on my validation set.
It…

Rotimi Omosewo
- 1
- 2
0
votes
0 answers
Oversample using SMOTE on certain features
My dataset have 5 features. 3 out of 5 are categorical and very imbalanced.
First question, should I split data into train and test set before applying SMOTE?
Second question, if yes to Q1, how do I apply SMOTE to only 3 features while leaving the…

tnguyen8935
- 1
- 1