Questions tagged [smote]

Smote is an abbreviation for Synthetic Minority Oversampling TEchnique. This tag refers to the oversampling method used commonly in machine learning to balance the class distributions in datasets by introducing new minority class examples.

In machine learning, most classifiers works assuming that the classes given in the training set are roughly balanced. When classes are imbalanced, classifiers tend towards predicting the majority class.

One way to overcome this is to carry out an interpolation among neighboring minority class instances and generate artificial samples.

Useful references:

One of the earlier publications on SMOTE: chawla et al 2002

One review on SMOTE: Fernández et al 2017

Influence of datasets on SMOTTE: Skryjomski et al 2017

Python toolbox for imbalanced datasets: Lemaˆıtre et al 2017

185 questions
-1
votes
1 answer

Python: should Data Scaling be done before Sampling in Machine Learning?

When should I do data scaling and Sampling (since my data is imbalanced)? Should I do data scaling first then Sampling?
new_bee
  • 87
  • 1
  • 8
-1
votes
1 answer

Tokenization of unbalanced dataset

I'm working with a dataset of emails' content which I want to transform with doc2vec. This is a labeled dataset (spam/not-spam) and it is unbalanced (90-10 ratio). My question is: when tokenizing the emails' content, should I first oversample (using…
-1
votes
2 answers

Why do I get 'Error in T[, col] <- data[, col]' when I use SMOTE in R?

I have a big dataset of fire occurring in forests, and I want to predict when the fire ignites. This happens very rarely: 290 times out of 620 000 times. A tibble: 62,905 x 13 amplitude polarity DEM_avg DC DMC DSR FFMC Pd RH TEMP …
Thomas
  • 441
  • 3
  • 16
-3
votes
0 answers

Tensorflow and Scikit learn problem - repeating accuracies that always equal 1

I'm working on a project using Tensorflow and Scikit learn. The dataset that I am working with is imbalanced so needed to use SMOTE. When I try to run data after applying SMOTE my accuracies always seem to be either 0.11 and 0.89, or 0.91 and 0.09.…
-3
votes
1 answer

Resampled data Does not show any value for target class after applying SMOTE

I am a bie in ML and i am trying to implement SMOTE on the PIDD dataset for diabetes prediction. from sklearn.model_selection import train_test_split from imblearn.over_sampling import SMOTE #os = SMOTE() X = exTrans.drop(['Outcome'], axis=1) y =…
1 2 3
12
13