Questions tagged [smote]

Smote is an abbreviation for Synthetic Minority Oversampling TEchnique. This tag refers to the oversampling method used commonly in machine learning to balance the class distributions in datasets by introducing new minority class examples.

In machine learning, most classifiers works assuming that the classes given in the training set are roughly balanced. When classes are imbalanced, classifiers tend towards predicting the majority class.

One way to overcome this is to carry out an interpolation among neighboring minority class instances and generate artificial samples.

Useful references:

One of the earlier publications on SMOTE: chawla et al 2002

One review on SMOTE: Fernández et al 2017

Influence of datasets on SMOTTE: Skryjomski et al 2017

Python toolbox for imbalanced datasets: Lemaˆıtre et al 2017

185 questions
2
votes
0 answers

For categorical variables generated by SMOTER

I am working on building a predictive model for a regression problem. And I am suffering from a phenomenon where the model cannot learn well due to the large number of '0's in the target variable. So, I have arrived at a SMOTER(SMOTE for Regression)…
Dai
  • 91
  • 8
2
votes
3 answers

AttributeError: module 'sklearn.metrics._dist_metrics' has no attribute 'DatasetsPair'

I'm trying to balanced my data on jupyter-notebook, using SMOTE: from imblearn import over_sampling from imblearn.over_sampling import SMOTE balanced = SMOTE() x_balanced , y_balanced = balanced.fit_resample(X_train,y_train) but I'm getting the…
omerk
  • 23
  • 1
  • 4
2
votes
0 answers

smotefamily::SMOTE -> Error in get.knnx(data, query, k, algorithm) : Data non-numeric

I'm having some issues using SMOTE, from the smotefamily package, my code keeps getting this error: Error in get.knnx(data, query, k, algorithm) : Data non-numeric I'm new at R Language, I'm trying to make the following work: dados_treino_bal <-…
2
votes
0 answers

Data augmentation using SMOTE for images

I have tried two ways to apply SMOTE function to my dataset. However, I can't figured out how to proceed with the Smote function. 1st method: I have applied data augmentation and then tried to apply SMOTE train_data_gen = ImageDataGenerator( …
Jenny
  • 21
  • 1
2
votes
1 answer

Image augmentation with SMOTE oversampling as batches without running out of RAM

I am trying to use an unbalanced dataset to feed a neural network. I am using colab. I found this code on kaggle which uses keras ImageDataGenerator for augmentation and SMOTE to oversample the data: Augmentation: ZOOM = [.99, 1.01] BRIGHT_RANGE =…
2
votes
1 answer

How can i impelement SMOTE inside a columnTransformer?

I'm trying to implement SMOTENC inside a column transformer. However I'm getting error. The code and the error is provided below. #Create a mask for categorical features categorical_feature_mask = X_train.dtypes == object categorical_columns =…
2
votes
1 answer

How do I apply SMOTENC to my data frame that has columns with objects and numerics?

> In: data.dtypes Out: Organization Name object Money Raised Currency (in USD) float64 Announced Date datetime64[ns] Total Funding Amount Currency (in USD) …
2
votes
1 answer

Correct way to do cross validation in a pipeline with imbalanced data

For the given imbalanced data , I have created a different pipelines for standardization & one hot encoding numeric_transformer = Pipeline(steps = [('scaler', StandardScaler())]) categorical_transformer = Pipeline(steps=['ohe',…
2
votes
1 answer

Imbalances-learn module base.py file syntax error coming up while importing SMOTE

I installed imbalanced-learn package using (Python 2.7): conda install -c conda-forge imbalanced-learn after installing it, I tried to import SMOTE from the package. from imblearn.over_sampling import SMOTE which gave the following error: File…
HawkEye04
  • 23
  • 3
2
votes
0 answers

Using SMOTE for imbalanced data

While doing the SMOTE , i get the following error. "Error in matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn, : length of 'dimnames' [2] not equal to array extent" the Below is my code bal.m <- SMOTE(Default ~.,…
Shree
  • 21
  • 1
2
votes
0 answers

Upsampling for the whole dataset or for each mini-batch

I am trying to train my convNet on a very large unbalanced dataset. It will be quite difficult to load the data altogether to memory and do the upsampling on the whole dataset. Instead, I want to load the data in mini-batches and do the upsampling…
2
votes
1 answer

AttributeError: 'DataFrame' object has no attribute 'name' when using SMOTE

I am using imblearn over_sampling SMOTE technique in order to balance my imbalanced dataset. Here is my sample code import pandas as pd dataset=pd.read_csv('E://IOT_Netlume//hourly_data.csv') features= dataset.iloc[:,[1,2,3,4]] target=…
DS_Geek
  • 53
  • 1
  • 10
2
votes
1 answer

How to save synthetic dataset in CSV file using SMOTE

I am using Credit card data for oversampling using SMOTE. I am using the code written in geeksforgeeks.org (Link) After running the following code, it states something like that: print("Before OverSampling, counts of label '1':…
1
vote
1 answer

Sklearn Pipeline - Customized 'Optional Estimator'

I have created this function below, that creates a pipeline and returns it. def make_final_pipeline(columns_transformer, onehotencoder, estimator, Name_of_estimator, index_of_categorical_features, use_smote=True): if use_smote: # Final…