Questions tagged [oversampling]

Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented).

156 questions
-1
votes
1 answer

One class completely ignored after oversampling

I built a decision tree and oversampled the minority class using smote. After this, class 2 (from classes 0, 1, 2, 3) is being completely ignored (for the unbalanced test set). Nothing is classified as class 2 correctly or wrong. How can this be?
maybeyourneighour
  • 494
  • 2
  • 4
  • 13
-1
votes
2 answers

How to fix samples < K-neighbours error in oversampling using SMOTE?

I am designing a multi class classifier for 11 labels. I am using SMOTE to tackle the sampling problem. However I face the following error:- Error at SMOTE from imblearn.over_sampling import SMOTE sm = SMOTE(random_state=42) X_res, Y_res =…
-1
votes
1 answer

What is the correct order for training the ML model?

I have a dataset containing multiclass dependent variable which is imbalanced. I want to know which is the correct order for training the…
-1
votes
1 answer

Multiclass classification to balance in python (over sampling)

I have the following problem, there is a classification problem. On the track 50,000 lines, on Y 60 labels. But the data is unbalanced (in one class, 35000 values, in the other 59 classes 15000 values, of which in some 30 values). If for example,…
Katrin
  • 11
  • 1
  • 5
-2
votes
1 answer

Can imbalance in class ratio in training and testing set cause poor validation accuracy?

I’m participating in a hackathon where we are supposed to predict whether a user is interested in jobs given features like gender, city, training hours, experience, current company etc. In training set there are about 90% who are not interested in…
-3
votes
1 answer

SMOTE oversampling for anomaly detection using a classifier

I have sensor data and I want to do live anomaly detection using LOF on the training set to detect anomalies and then apply the labeled data to a classifier to do classification for new data points. I thought about using SMOTE because I want more…
1 2 3
10
11