Questions tagged [smote]

Smote is an abbreviation for Synthetic Minority Oversampling TEchnique. This tag refers to the oversampling method used commonly in machine learning to balance the class distributions in datasets by introducing new minority class examples.

In machine learning, most classifiers works assuming that the classes given in the training set are roughly balanced. When classes are imbalanced, classifiers tend towards predicting the majority class.

One way to overcome this is to carry out an interpolation among neighboring minority class instances and generate artificial samples.

Useful references:

One of the earlier publications on SMOTE: chawla et al 2002

One review on SMOTE: Fernández et al 2017

Influence of datasets on SMOTTE: Skryjomski et al 2017

Python toolbox for imbalanced datasets: Lemaˆıtre et al 2017

185 questions
0
votes
0 answers

Python GridSearchCV smote otimization error

Can anyone help me with an error I am receiving when running grid search. I can't resolve this error. My code: # get feature and label data feature_data = creditcard_data.loc[:, creditcard_data.columns != 'Class'] label_data = creditcard_data.loc[:,…
0
votes
2 answers

SMOTE-NC in R. No packages found

I have a dataset with 5 nominal and 37 categorical variables. I want to perform oversampling in R. However, with SMOTE, I cannot do so. I looked for SMOTE-NC as advised by (Chawla, Bowyer and Hall, 2002), but I could not find any package supporting…
0
votes
2 answers

Error in factor(newCases[, a], levels = 1:nlevels(data[, a]), labels = levels(data[, : invalid 'labels'; length 0 should be 1 or 2

I am running SMOTE function as given below : # install.packages("DMwR") for SMOTE implementation library(DMwR) smoted_data <- SMOTE(state~., deliq, perc.over=200, perc.under = 1600) But i am getting below error : Error in factor(newCases[, a],…
SKB
  • 189
  • 1
  • 13
0
votes
0 answers

SMOTE with more than 2 classes?

I am working in R with a dataset of Olympic data that is very unbalanced and am looking for a way to balance the data appropriately. After researching some, I found that ROSE works nicely, however, with only 2 classes. My output has 4 classes…
TFurrer
  • 1
  • 2
0
votes
1 answer

Adjust predicted probability after smote

I have an imbalance data set and I used smote to oversample the minority class and undersample the majority class. now, I want to check the test AUC using predict_proba of the model. I have two questions: 1. Do I have to correct the probability if I…
anat
  • 705
  • 2
  • 7
  • 20
0
votes
1 answer

Upsampling tweets using SMOTE

I have an imbalanced dataset of tweets labeled as -1, 0, +1. I wanna balance the numbers by upsampling. I receive the following error: tweet_train=tweet_train.reshape(-1, 1) X_train_upsample, y_train_upsample =…
Vahid the Great
  • 393
  • 5
  • 18
0
votes
1 answer

Data scaling before call SMOTENC for continuos and categorical features

so far my code is the following to run SMOTENC. from imblearn.over_sampling import SMOTENC smt = SMOTENC(random_state=seed,…
mjbsgll
  • 722
  • 9
  • 24
0
votes
1 answer

Imbalance in multi class classification problem - four target levels

I am having imbalance in my data as shown below, Whenever I have tried with ADASYN it shows error, Do we need to provide any parameter entry for the same ? Some time it runs for long time but no response even after 40 minutes of code run. …
0
votes
1 answer

Should I perform GridSearch (for tunning hyper parameters) before or after SMOTE?

I am using an imbalanced data to perform classification with scikit-learn and to improve the model's accuracy, I created more synthetic data with the SMOTE technique. I want to know the best moment to realize the hyperparameter optimization with…
0
votes
0 answers

ValueError: could not convert string to float SMOTE fit_sample Python Oversampling

I have a credit risk analysis dataset which goes like this: Loan_ID Age Income(LPA) Employed_yr Education Loan_status 1 18 2.4 1 12th 1 2 46 43 26 …
noob
  • 3,601
  • 6
  • 27
  • 73
0
votes
0 answers

I have an error in missing values are not allowed in subscripted assignments of data frames

I am new to R and I am constructing R codes for my personal project/exercise. The data I am using is about a survey on ethnic identity of people from Hongkong. I used 2019 data from http://data.hkupop.hku.hk/v3/hkupop/ethnic_identity/ch.html. After…
0
votes
0 answers

Trouble Creating Testing/Training Features To Oversample the Minority

I am trying to recreate a tutorial made by Nick Becker. It is located at https://beckernick.github.io/oversampling-modeling/ The code he has posted works when you copy and paste it in to Jupyter Notebook. I am trying to recreate this with a…
0
votes
1 answer

Using SMOTE to oversample a binary class; why is it returning random float values between 0 and 1?

I'm using SMOTE to resample a binary class TARGET_FRAUD which includes values 0 and 1. 0 has around 900 records, while 1 only has about 100 records. I want to oversample class 1 to around 800. This is to perform some classificatioin modeling. #fix…
vnguyen56
  • 55
  • 6
0
votes
1 answer

Deep Learning with Small Datasets and SMOTE

I have a data with 6000 records. I am having a train, validate and test set of 60-20-20. I am getting an accuracy of around 76% with XGboost. I converted my data into Time series and I apply LSTM/1-D Convnets and the accuracy is around 60%. Is my…
-1
votes
1 answer

IMBALANCE in Object detection

`how can we use SMOTE to tackle class imbalance for object detection in Tensorflow, where the model output is labels and bboxes? This is my code but i am not able to figure out where and how should i use smote for class imbalance?` import…
1 2 3
12
13