Questions tagged [oversampling]

Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented).

156 questions
1
vote
0 answers

Oversampling a class in classification problem

I have nearly 100000 data point with 15 features for 'disease' and 'no disease' as target. But my data is imbalanced. 97% of my data is no disease and 3% is disease. To overcome this I manually created disease data by creating 7 copies from the…
aim
  • 301
  • 1
  • 3
  • 10
1
vote
0 answers

Imbalanced data, regression tree and SMOTE oversampling

I am trying to build a binary classification tree with the rpart package in R on a dataset but the overall accuracy achieved on the model is way too high (99.8%?) and the tree is huge with many splits. Will this be an indication of an overfitted…
1
vote
0 answers

Cannot install imblearn to use SMOTE

I have been trying to install imblearn to use SMOTE, and I thought it was successful, but when I type in this in my Jupyter Notebook from imblearn.over_sampling import SMOTE, I get the error ImportError: cannot import name 'SMOTE'. Do you know why…
Jane Sully
  • 3,137
  • 10
  • 48
  • 87
1
vote
3 answers

How to oversample an array of n string elements into an array of m string elements

l would like to oversample an array of n element into an array of m elements such that m > n. For instance let's take n=3 colors=['red','blue','green'] set m =7 What l'm looking for ? …
eric lardon
  • 351
  • 1
  • 6
  • 21
0
votes
0 answers

Problems importing imblearn python package on Google Colab

I want to use SMOTE to resampling my dataset. When I'm on Google Colab and I tried to import the package using: from imblearn.over_sampling import SMOTE I get the error: ImportError: cannot import name '_check_X' from 'imblearn.utils._validation'…
0
votes
0 answers

why do I get weird plots on ANN with random oversampling

I did an ANN text classification with labels 0=negative and 1=positive, the amount of my positive data was almost 3 times more than the negative data. I did an experiment using random oversampling with smote and without random oversampling. I don't…
Andryan
  • 11
  • 2
0
votes
0 answers

Multi-class oversampling based online bagging

can anyone give me MOOB code in python Multi-class classication problems are often considered more challenging than their binary counterparts because multiple classes can increase the data complexity and aggravate the imbalanced distribution A…
0
votes
0 answers

How to use resampling/oversampling methods to calculate the p-value of a single point or generate new data in the "tails" of a distribution?

Say we have a small number (N=10) of measurements of a variable X from an unknown distribution (possibly skewed or bimodal). And I want to calculate the probability that another measure belongs to the same distribution, or the p-value. How can I do…
skan
  • 7,423
  • 14
  • 59
  • 96
0
votes
1 answer

imblearn library BorderlineSMOTE module does not generate any synthetic data

I tried to generate synthetic data with Border line SMOTE in imblearn library but no synthetic data was generated. I am working with a multiclass based dataset, for purposes of generating data I split my dataframe into the minority class and…
Dulangi_Kanchana
  • 1,135
  • 10
  • 21
0
votes
0 answers

Appropriate way to use post-stratification weights when running statistical tests SPSS

I have used Complex Samples in SPSS (and SUDAAN in SAS, Survey in R) when working with survey data that were collected using a sampling design that was not random. For example, when an oversample was included in the data collection. Complex Samples…
Brett Wyker
  • 15
  • 2
  • 6
0
votes
0 answers

After Oversamling Smote With IsolationForest my result doesnt improve

my dataset test is 0 17565 1 2435 train is 0 70212 1 9788 I applied oversampling Smote with IsolationForest algorithm on just training set before oversampling results: F1 Score : 0.9278732648748262 Accuracy Score…
0
votes
1 answer

Can I correct the coefficient standard errors after oversampling my data?

I am trying to fit a fixed effects linear regression to my data and interpret the coefficients. I have an imbalanced dataset (~97% negative cases), which was affecting my ability to fit the model and calculate coefficients for every independent…
0
votes
0 answers

Sample data to fit a sigmoidal histogram in Python

Hi I am looking for a way to sample a subset of data from a larger pool of data. For example say I have 4000 entries which have a value X uniformly distributed between 0 and 1.5. import matplotlib.pyplot as plt import random randomlist = [] for i in…
0
votes
0 answers

imbalanced Dataset Challenges with Force-Displacement Curves: Seeking Solutions

I have a dataset consisting of force-displacement curves. The dataset is heavily imbalanced, with the negative class having 29,000 samples and the positive class having only 100 samples. After transforming the force-displacement curves with tsfresh,…
0
votes
0 answers

How would you implement some of the measures for class imbalance treatment into fasttext? For example Weighted Cross entropy? Or focal loss?

How much is this implementation demanding? Are there any open resources to it? I would like to make some modification on the objective function or to create some methods for loss function that would be more sensitive for minority class in…