Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented).
Questions tagged [oversampling]
156 questions
-1
votes
1 answer
One class completely ignored after oversampling
I built a decision tree and oversampled the minority class using smote. After this, class 2 (from classes 0, 1, 2, 3) is being completely ignored (for the unbalanced test set). Nothing is classified as class 2 correctly or wrong. How can this be?

maybeyourneighour
- 494
- 2
- 4
- 13
-1
votes
2 answers
How to fix samples < K-neighbours error in oversampling using SMOTE?
I am designing a multi class classifier for 11 labels. I am using SMOTE to tackle the sampling problem. However I face the following error:-
Error at SMOTE
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=42)
X_res, Y_res =…

cappy0704
- 557
- 2
- 9
- 30
-1
votes
1 answer
What is the correct order for training the ML model?
I have a dataset containing multiclass dependent variable which is imbalanced. I want to know which is the correct order for training the…
-1
votes
1 answer
Multiclass classification to balance in python (over sampling)
I have the following problem, there is a classification problem. On the track 50,000 lines, on Y 60 labels. But the data is unbalanced (in one class, 35000 values, in the other 59 classes 15000 values, of which in some 30 values). If for example,…

Katrin
- 11
- 1
- 5
-2
votes
1 answer
Can imbalance in class ratio in training and testing set cause poor validation accuracy?
I’m participating in a hackathon where we are supposed to predict whether a user is interested in jobs given features like gender, city, training hours, experience, current company etc.
In training set there are about 90% who are not interested in…

Vikas NS
- 408
- 5
- 19
-3
votes
1 answer
SMOTE oversampling for anomaly detection using a classifier
I have sensor data and I want to do live anomaly detection using LOF on the training set to detect anomalies and then apply the labeled data to a classifier to do classification for new data points. I thought about using SMOTE because I want more…

Ahmad Ayyad
- 67
- 3