Questions tagged [smote]

Smote is an abbreviation for Synthetic Minority Oversampling TEchnique. This tag refers to the oversampling method used commonly in machine learning to balance the class distributions in datasets by introducing new minority class examples.

In machine learning, most classifiers works assuming that the classes given in the training set are roughly balanced. When classes are imbalanced, classifiers tend towards predicting the majority class.

One way to overcome this is to carry out an interpolation among neighboring minority class instances and generate artificial samples.

Useful references:

One of the earlier publications on SMOTE: chawla et al 2002

One review on SMOTE: Fernández et al 2017

Influence of datasets on SMOTTE: Skryjomski et al 2017

Python toolbox for imbalanced datasets: Lemaˆıtre et al 2017

185 questions
1
vote
0 answers

How to use smote function and XGboost in R at the same time

Anyone knows how to use smote function and XGboost in R at the same time? I used smote function to balance the classes first, then use XGboost for training. However, one of the parameters in the xgb.cv function is label, which requires as.matrix…
Gracetam
  • 19
  • 1
  • 6
1
vote
1 answer

Using SMOTE-NC with categorical variables only

I am dealing with a dataframe containing only categorical features. To reproduce the issue I am facing I am going to make the following example: d = {'col1':['a','b','c','a','c','c','c','c','c','c'], …
Wiliam
  • 1,078
  • 10
  • 21
1
vote
1 answer

How to use dictionary in SMOTE algorithm for resampling the multi-class input data differently?

I want to perform oversampling using the SMOTE algorithm in python using the library imblearn.over_sampling. My input data has four target classes. I don't want to oversample all the minority class distribution to match with the majority class…
1
vote
1 answer

Saving oversampled dataset as csv file in pandas

I am new to Python and apologize in advance, if it is too simple. Cannot find anything and this question did not help. My code is # Split data y = starbucks_smote.iloc[:, -1] X = starbucks_smote.drop('label', axis = 1) # Count labels by…
Anakin Skywalker
  • 2,400
  • 5
  • 35
  • 63
1
vote
0 answers

Classification model for rare events when SMOTE doesn't work

I have a data frame with rare events in the target variable- label=1 is less than 1% and I want to build a classification model. All the classic models display poor performance, and using SMOTE or other sampling technique didn't make a…
Lili
  • 371
  • 3
  • 13
1
vote
1 answer

How to remove minority classes with less than a certain number of examples before performing SMOTE, python

I have a dataset which contains 100 columns as feature vectors(100D feature vectors) generated from word2vec and my target is a categorical variable for each of the rows of vector in my dataset. Now there are around 1000 different categorical…
Erich
  • 87
  • 6
1
vote
0 answers

incorporating SMOTE using Python. Highly imbalanced dataset

I have been trying to play around with certain datasets i found on github to see how well i can conduct a sentiment analysis on different datasets and how codes work. So i have a dataset which i wanted to incorporate in the code i found the only…
user11396788
1
vote
1 answer

Create balanced dataset 1:1 using SMOTE without modifying the observations of the majority class in R

I am working on a binary classification problem for which I have an unbalanced dataset. I want to create a new more balanced dataset with 50% of observation in each class. For this, I am using SMOTE algorithm in R provided by DMwR library. In the…
Mouaici_Med
  • 390
  • 2
  • 19
1
vote
1 answer

Oversampling Using SMOTE Removes a Label Category from y_train

I'm using LSTM for Sentiment Analysis by using imbalanced dataset having 86% positive class and 14% negative class samples. It's a very small dataset with 472 sentences but they're in regional language. Train_test_split ratio is 0.3. I'm having two…
1
vote
0 answers

SMOTE on dataframe of arrays issues

I'm trying to attempt to SMOTE on a dataframe full of sliding windows here: DataFrame I'm using imblearn's SMOTE() function on it. Without any manipulation, I'm getting an error that each cell must have a size 1 array. SMOTING individually by rows…
1
vote
0 answers

How to add ANN in python smote_variants model_selection?

I am working with the library smote-variants for python (https://pypi.org/project/smote-variants/) and I want to use the method model_selection to select the best classifier and the best resampling method. I do it just like this: lista_oversamplers…
jartymcfly
  • 1,945
  • 9
  • 30
  • 51
1
vote
1 answer

SMOTE is giving array size / ValueError for all-categorical dataset

I am using SMOTE-NC for oversampling my categorical data. I have only 1 feature and 10500 samples. While running the below code, I am getting the error: --------------------------------------------------------------------------- ValueError …
Shivam Agrawal
  • 2,053
  • 4
  • 26
  • 42
1
vote
1 answer

Score decline with imblearn pipeline and SMOTE

I have a pipeline: np.random.seed(42) tf.random.set_seed(42) pipeline = Pipeline([ ('smote', SMOTE()), ('under',RandomUnderSampler()), ('cl', KerasClassifier(build_fn=create_model, verbose=0)) ]) param_grid_pipeline = { …
belz
  • 47
  • 6
1
vote
0 answers

I'm handling an imbalanced dataset. Does applying SMOTE algorithm to up-sample the minority class yields same accuracy and roc_auc_score?

The following code shows how when trained on a SMOTE-produced dataset the accuracy and roc_auc_score comes out as same. sm = SMOTE(random_state = 2) #creating SMOTE object X_t, y_t = sm.fit_sample(X, y) #fitting X and y in smote and storing the new…
1
vote
1 answer

variation problem of smote for regression

I'm working on the estimation of ticket sales with insufficient and imbalanced data. To fix the problem, I'm using smoter(smote for regression) from smogn package, but each time I run my model, I've got different predictions on my target. I reckon…