Questions tagged [smote]

Smote is an abbreviation for Synthetic Minority Oversampling TEchnique. This tag refers to the oversampling method used commonly in machine learning to balance the class distributions in datasets by introducing new minority class examples.

In machine learning, most classifiers works assuming that the classes given in the training set are roughly balanced. When classes are imbalanced, classifiers tend towards predicting the majority class.

One way to overcome this is to carry out an interpolation among neighboring minority class instances and generate artificial samples.

Useful references:

One of the earlier publications on SMOTE: chawla et al 2002

One review on SMOTE: Fernández et al 2017

Influence of datasets on SMOTTE: Skryjomski et al 2017

Python toolbox for imbalanced datasets: Lemaˆıtre et al 2017

185 questions
1
vote
2 answers

TypeError: cannot unpack non-iterable SMOTE object - Use for NLP Email Export Classification

I am having an issue in using SMOTE in a NLP project I am working on. My output is showing it has a non iterable SMOTE object. By using the untrained Y, it can tell there are multiple rows, so it clearly can see the values aren't null. I tried…
Rob
  • 21
  • 5
1
vote
3 answers

Scikit Learn Pipeline with SMOTE

I would like to create a Pipeline with SMOTE() inside, but I can't figure out where to implement it. My target value is imbalanced. Without SMOTE I have very bad results. My code: df_n = df[['user_id','signup_day', 'signup_month', 'signup_year', …
Anastasia_data
  • 27
  • 1
  • 1
  • 8
1
vote
1 answer

Error using SMOTE TypeError: cannot safely cast non-equivalent float64 to int64

I'm preparing an unbalanced dataset and would like to use a Python package called SMOTE. When I try to run the code it shows up an error: TypeError: cannot safely cast non-equivalent float64 to int64 My dataset (first 5 rows): Dataset The error…
1
vote
3 answers

AttributeError: 'NoneType' object has no attribute 'split' SMOTE

I'm resampling some data using SMOTE and getting an error like this: AttributeError: 'NoneType' object has no attribute 'split' my code : sm = SMOTE(random_state = 42) X_train_resampled, y_train_resampled = sm.fit_resample(X_train_final,…
1
vote
1 answer

Performing Random Under-sampling after SMOTE using imblearn

I am trying to implement combining over-sampling and under-sampling using RandomUnderSampler() and SMOTE(). I am working on the loan_status dataset. I have done the following split. X = df.drop(['Loan_Status'],axis=1).values # independant…
1
vote
0 answers

using SMOTE to treat imbalanced 3d array data

I have below data and here is the distribution of the classes. X shape == (477324, 5, 11) Y shape == (477324,) {0: 11986, 1: 465338} Since my dataset is imbalanced, I have tried RandomOverSampling using below code. from imblearn.over_sampling…
be_real
  • 180
  • 1
  • 1
  • 12
1
vote
0 answers

Use of smoteRegress R package in python using rpy2

I'm trying to use the R package smoteRegress in python using rpy2. The function SmoteRegress take the parameter form (formula describing the prediction problem). This parameter in R is express by the formula Target_Variable~. that mean a generic…
1
vote
0 answers

How to randomise the rebalancing of a dataset

In python I am trying to rebalance a dataset which contains approximately 4000 transactions for a single credit card number, which are all ordered by time. There is a large class imbalance between genuine and fraudulent transaction, and this data…
1
vote
0 answers

Effect of SMOTE on Random Forest and Logistic Regression on a Cell2Cell churn dataset

I am doing an analysis of the effect of SMOTE on the performance of Random Forest and Logistic Regression. I have the following data from kaggle. The data consists of around 50000 observations and 58 variables. I trained four models on it: Random…
RasM10
  • 25
  • 4
1
vote
0 answers

How to apply SMOTE on multivarite time series data?

I am new to multivarite time series problems. My data is imbalanced and I want to balance the data so I tried to apply imblearn.over_sampling.SMOTE on the raw time series data but it failed. How to apply SMOTE on multivariate time series data and is…
Peniblast
  • 11
  • 1
1
vote
1 answer

SMOTE technique not oversampling image dataset

I am new to imblearn library. I have image dataset belongs to 5 categories,the dataset is highly unbalanced. I load images using tensorflow flow.from directory function and use smote function for resampling. img_height, img_width = 224,224 # the no.…
maarij qamar
  • 129
  • 1
  • 3
  • 14
1
vote
0 answers

Configure SMOTE for Vector Prediction

I am working in a multi-label prediction task where the label is encoded as one-hot encoded vector such as [1, 0, 0] or [0, 1, 0] or [0, 0, 1] of type ndarray. The dataset is imbalanced. Hence, I am using SMOTE. This works and upsamples all minority…
Janothan
  • 446
  • 4
  • 16
1
vote
1 answer

using SMOTE with tensorflow's ImageDataGenerator Flow From Directory

Using Python3.6, TF 1.15, imblearn 0.0 I have an imbalanced data set, 3 classes, two are even, one is low. I am trying to apply SMOTE to the dataset, however, I am using flow from directory and I found out I can supposedly obtain X_train and y_train…
Ripfury
  • 39
  • 8
1
vote
1 answer

how to use SMOTE & feature selection together in sklearn pipeline?

from imblearn.pipeline import Pipeline from imblearn.over_sampling import SMOTE smt = SMOTE(random_state=0) pipeline_rf_smt_fs = Pipeline( [ ('preprocess',preprocessor), ('selector', SelectKBest(mutual_info_classif, k=30)), …
1
vote
1 answer

Not able to feed the combined SMOTE & RandomUnderSampler pipeline into the main pipeline

I am currently working with an Imbalanced datatset, and inorder to handle Imbalance, I plan on combining SMOTE and ADASYN with RandomUnderSampler, and also indivitual undersampling, oversampling, SMOTE & ADASYN (A total of 6 sampling ways, which I…
1 2
3
12 13