Questions tagged [smote]

Smote is an abbreviation for Synthetic Minority Oversampling TEchnique. This tag refers to the oversampling method used commonly in machine learning to balance the class distributions in datasets by introducing new minority class examples.

In machine learning, most classifiers works assuming that the classes given in the training set are roughly balanced. When classes are imbalanced, classifiers tend towards predicting the majority class.

One way to overcome this is to carry out an interpolation among neighboring minority class instances and generate artificial samples.

Useful references:

One of the earlier publications on SMOTE: chawla et al 2002

One review on SMOTE: Fernández et al 2017

Influence of datasets on SMOTTE: Skryjomski et al 2017

Python toolbox for imbalanced datasets: Lemaˆıtre et al 2017

185 questions

-1

votes

1 answer

Python: should Data Scaling be done before Sampling in Machine Learning?

When should I do data scaling and Sampling (since my data is imbalanced)? Should I do data scaling first then Sampling?

asked Nov 26 '21 at 22:51

new_bee

-1

votes

1 answer

Tokenization of unbalanced dataset

I'm working with a dataset of emails' content which I want to transform with doc2vec. This is a labeled dataset (spam/not-spam) and it is unbalanced (90-10 ratio). My question is: when tokenizing the emails' content, should I first oversample (using…

machine-learning nlp doc2vec imbalanced-data smote

asked Jan 07 '21 at 10:31

Efrat Magidov

-1

votes

2 answers

Why do I get 'Error in T[, col] <- data[, col]' when I use SMOTE in R?

I have a big dataset of fire occurring in forests, and I want to predict when the fire ignites. This happens very rarely: 290 times out of 620 000 times. A tibble: 62,905 x 13 amplitude polarity DEM_avg DC DMC DSR FFMC Pd RH TEMP …

r logistic-regression smote

asked Jul 06 '20 at 06:46

Thomas

-3

votes

0 answers

Tensorflow and Scikit learn problem - repeating accuracies that always equal 1

I'm working on a project using Tensorflow and Scikit learn. The dataset that I am working with is imbalanced so needed to use SMOTE. When I try to run data after applying SMOTE my accuracies always seem to be either 0.11 and 0.89, or 0.91 and 0.09.…

python tensorflow scikit-learn classification smote

asked Aug 25 '23 at 16:38

Dr-newby

-3

votes

1 answer

Resampled data Does not show any value for target class after applying SMOTE

I am a bie in ML and i am trying to implement SMOTE on the PIDD dataset for diabetes prediction. from sklearn.model_selection import train_test_split from imblearn.over_sampling import SMOTE #os = SMOTE() X = exTrans.drop(['Outcome'], axis=1) y =…

python machine-learning deep-learning artificial-intelligence smote

asked Mar 03 '23 at 13:59

Ebuka Eluzai

Prev 1 2 3

…