Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented).
Questions tagged [oversampling]
156 questions
1
vote
0 answers
Oversampling a class in classification problem
I have nearly 100000 data point with 15 features for 'disease' and 'no disease' as target.
But my data is imbalanced. 97% of my data is no disease and 3% is disease.
To overcome this I manually created disease data by creating 7 copies from the…

aim
- 301
- 1
- 3
- 10
1
vote
0 answers
Imbalanced data, regression tree and SMOTE oversampling
I am trying to build a binary classification tree with the rpart package in R on a dataset but the overall accuracy achieved on the model is way too high (99.8%?) and the tree is huge with many splits.
Will this be an indication of an overfitted…

Jonathan Tan
- 11
- 2
1
vote
0 answers
Cannot install imblearn to use SMOTE
I have been trying to install imblearn to use SMOTE, and I thought it was successful, but when I type in this in my Jupyter Notebook from imblearn.over_sampling import SMOTE, I get the error ImportError: cannot import name 'SMOTE'. Do you know why…

Jane Sully
- 3,137
- 10
- 48
- 87
1
vote
3 answers
How to oversample an array of n string elements into an array of m string elements
l would like to oversample an array of n element into an array of m elements such that m > n.
For instance let's take n=3
colors=['red','blue','green']
set m =7
What l'm looking for ?
…

eric lardon
- 351
- 1
- 6
- 21
0
votes
0 answers
Problems importing imblearn python package on Google Colab
I want to use SMOTE to resampling my dataset. When I'm on Google Colab and I tried to import the package using:
from imblearn.over_sampling import SMOTE
I get the error:
ImportError: cannot import name '_check_X' from 'imblearn.utils._validation'…
0
votes
0 answers
why do I get weird plots on ANN with random oversampling
I did an ANN text classification with labels 0=negative and 1=positive, the amount of my positive data was almost 3 times more than the negative data. I did an experiment using random oversampling with smote and without random oversampling. I don't…

Andryan
- 11
- 2
0
votes
0 answers
Multi-class oversampling based online bagging
can anyone give me MOOB code in python
Multi-class classication problems are often considered more challenging than their binary
counterparts because multiple classes can increase the data complexity and aggravate the
imbalanced distribution
A…

gul
- 1
0
votes
0 answers
How to use resampling/oversampling methods to calculate the p-value of a single point or generate new data in the "tails" of a distribution?
Say we have a small number (N=10) of measurements of a variable X from an unknown distribution (possibly skewed or bimodal).
And I want to calculate the probability that another measure belongs to the same distribution, or the p-value.
How can I do…

skan
- 7,423
- 14
- 59
- 96
0
votes
1 answer
imblearn library BorderlineSMOTE module does not generate any synthetic data
I tried to generate synthetic data with Border line SMOTE in imblearn library but no synthetic data was generated. I am working with a multiclass based dataset, for purposes of generating data I split my dataframe into the minority class and…

Dulangi_Kanchana
- 1,135
- 10
- 21
0
votes
0 answers
Appropriate way to use post-stratification weights when running statistical tests SPSS
I have used Complex Samples in SPSS (and SUDAAN in SAS, Survey in R) when working with survey data that were collected using a sampling design that was not random. For example, when an oversample was included in the data collection. Complex Samples…

Brett Wyker
- 15
- 2
- 6
0
votes
0 answers
After Oversamling Smote With IsolationForest my result doesnt improve
my dataset
test is 0 17565
1 2435
train is 0 70212
1 9788
I applied oversampling Smote with IsolationForest algorithm on just training set
before oversampling results:
F1 Score :
0.9278732648748262
Accuracy Score…

David
- 21
- 2
0
votes
1 answer
Can I correct the coefficient standard errors after oversampling my data?
I am trying to fit a fixed effects linear regression to my data and interpret the coefficients. I have an imbalanced dataset (~97% negative cases), which was affecting my ability to fit the model and calculate coefficients for every independent…

cbowers
- 137
- 8
0
votes
0 answers
Sample data to fit a sigmoidal histogram in Python
Hi I am looking for a way to sample a subset of data from a larger pool of data.
For example say I have 4000 entries which have a value X uniformly distributed between 0 and 1.5.
import matplotlib.pyplot as plt
import random
randomlist = []
for i in…

littlefield
- 55
- 5
0
votes
0 answers
imbalanced Dataset Challenges with Force-Displacement Curves: Seeking Solutions
I have a dataset consisting of force-displacement curves. The dataset is heavily imbalanced, with the negative class having 29,000 samples and the positive class having only 100 samples. After transforming the force-displacement curves with tsfresh,…

Fatih
- 1
- 1
0
votes
0 answers
How would you implement some of the measures for class imbalance treatment into fasttext? For example Weighted Cross entropy? Or focal loss?
How much is this implementation demanding? Are there any open resources to it? I would like to make some modification on the objective function or to create some methods for loss function that would be more sensitive for minority class in…

Vitomir Jovanovic
- 41
- 6