Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented).
Questions tagged [oversampling]
156 questions
3
votes
0 answers
Python - How to differentiate SMOTE resampling from original data
I over sampled my data using SMOTE like so:
>>> from imblearn.over_sampling import SMOTE
>>> X_resampled, y_resampled = SMOTE().fit_resample(X, y)
So now X_resampled, y_resampled are larger than the original data set.
How can I tell apart the…

Shlomi Schwartz
- 8,693
- 29
- 109
- 186
3
votes
1 answer
Keras: multi class imbalanced data classification is overfitting
I have a small dataset of ~1000 rows with two categorical columns [Message], [Intent]. I want to create a classification model and make predictions for new, unseen messages.
The 29 unique intents are imbalanced, ranging from 116 to 4 value…

joasa
- 946
- 4
- 15
- 35
3
votes
1 answer
Output of shape for training after oversampling with imbalanced-learn
I am using imbalanced-learn to oversample my data. I want to know how many entries in each class there are after using the oversampling method.
This code works nicely:
import imblearn.over_sampling import SMOTE
from collections import Counter
def…

Christoph H.
- 173
- 1
- 14
3
votes
1 answer
Python oversampling combine several samplers in a pipeline
My issue concerns the Value Error raised by SMOTE class.
Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6
# imbalanced learn is a package containing impelementation of SMOTE
from imblearn.over_sampling import SMOTE, ADASYN,…

Alibek Jakupov
- 620
- 6
- 14
3
votes
1 answer
Is there a package or function that can do SMOTE with continuous and categorical features?
I have an unbalanced data set with a categorical dependent variable and feature variables that are continuous and categorical. I know that the SMOTE function from the DMwR package can handle only continuous features. Is there package that can handle…

MasterStudent1992
- 39
- 1
- 5
3
votes
0 answers
Correct split of dependent variable values in machine learning?
I am making a machine learning model in Python and there are only categorical variables in the data set. I want a precision of minimum 90% (for the value of 1 in the dependent variable).
In the original data (the raw YTD data that I pulled from the…
user5751943
3
votes
1 answer
SMOTE function 'subscript out of bond'
I'm trying to implement a logistic regression as follows:
However I can't get good predictions because my class output 1 is under-represented in my data.
Therefore I'm trying to apply SMOTE algorithm to my trainset in order to get better…

T. Ciffréo
- 126
- 10
3
votes
1 answer
Get pixel coordinates from ra, dec after oversampling FITS image
I'm looking for a way to locate the pixel coordinates on my FITS image that correspond to ra and dec positions of an object in degrees, after oversampling. This would be simple if I wasn't oversampling, but I need to. Given an unaltered FITS image,…

curious_cosmo
- 1,184
- 1
- 18
- 36
3
votes
1 answer
Multi-Class Classification: SMOTE oversampling for multiple columns in a row
I have an imbalanced dataset contained in a dataframe called city_country that is made up of 5 columns:
Content of a tweet = preprocessed
An event type (e.g. tweet relates to earthquake = 'earthquake', typhoon = 'typhoon', etc.) =…

Christopher Loynes
- 133
- 3
- 10
3
votes
1 answer
How to oversample a dataframe in Pyspark?
How to oversample a dataframe in pyspark?
df.sample(fractions, seed)
Which only sample a fraction of the df, it can't oversample.

Stevven
- 31
- 1
- 3
2
votes
0 answers
python : how to improve classification results after having use combination of oversampling (SMOTE) and undersampling (RandomUnderSampler)
I have a problem of imbalanced classes and small dataset :
0 : 142
1 : 29
I try to find the right method to deal with this issue and the best algorithm.
For now the best results I have came from using a combination of oversampling with SMOTE and…

DuneC
- 21
- 1
2
votes
1 answer
2D Gaussian oversampling over large dataframe
I currently have a dataframe in the following format:
step tag_id x_pos y_pos
1 1 5 3
1 2 3 4
2 1 2 2
2 3 1 6
.........................
.........................
N 1 …

ebrithilotho
- 142
- 1
- 8
2
votes
1 answer
How to oversample a 3d array?
I'm trying to predict the category of a news article based on 2 features: author name and article headline.
I have transformed both columns separately using CountVectorizer and TfidfTransformer. Thus, what I have now is a 3D array (ie. array of list…

Brian
- 33
- 1
- 6
2
votes
1 answer
AttributeError: 'DataFrame' object has no attribute 'name' when using SMOTE
I am using imblearn over_sampling SMOTE technique in order to balance my imbalanced dataset.
Here is my sample code
import pandas as pd
dataset=pd.read_csv('E://IOT_Netlume//hourly_data.csv')
features= dataset.iloc[:,[1,2,3,4]]
target=…

DS_Geek
- 53
- 1
- 10
2
votes
3 answers
How can i apply SMOTE for multiclass text data
I have a Multiclass dataset for which i want to use SMOTE, but i am facing an
ValueError: "sampling_strategy" can be a float only when the type of
target is binary. For multi-class, use a dict.
I want to balance my data using SMOTE or any other…

ANURAG SINGH
- 23
- 5