Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented).
Questions tagged [oversampling]
156 questions
0
votes
1 answer
Oversampling method using R
I'm studying oversampling method using R. Let's say I want to do oversampling from the data df.
df <- data.frame(y=rep(as.factor(c('Yes', 'No')), times=c(90, 10)),
x1=rnorm(100),
x2=rnorm(100))
Obviously, df has 10…

Lee
- 369
- 1
- 6
0
votes
1 answer
Oversampling on binary classification
everyone.
I am doing a binary classification on a huge dataset (190 columns, 500K records). The target values are 0 and 1. However, when I do the oversampling with SMOTE, new target values in the y-vector are created (0, 1, 2 for example). I do not…

Johnny Torres
- 1
- 1
0
votes
1 answer
Error 'names' attribute [35563] must be the same length as the vector [1] after running SMOGNRegress in R
I'm trying to oversample an imbalanced dataset with a continuous target variable using SMOGNRegress from the UBL package in R.
When I run the code:
SMOGNRegress(Deceased~., normalized_data, rel = "auto", thr.rel = 0.9999, C.perc = "balance", k = 2,…

ChristianWald
- 1
- 2
0
votes
1 answer
Eliminate hairlines from a vector graphics by converting to oversampled bitmap and then downscaling - How with ImageMagick?
I used Apple Numbers (a Spreadsheet app with styling options) to create a UX flowchart of various user interfaces of an app.
Apple Numbers has a PDF export option.
The problem is that even though some border lines in the table have been set to…

porg
- 1,079
- 1
- 11
- 17
0
votes
0 answers
Multiclass Sampling Strategy
Scenario :
Currently I am working on multiclass classification problem. I have 2 million historical dataset of having 180 classes and need to create model which will predict the classes accurately. I have created model using HybridGradientboosting…

Makarand Rayate
- 1
- 2
0
votes
1 answer
SMOTE_NC function in R: error in the ouput
thank you in advance for your time!
I'm having some trouble with the SMOTE_NC function in R (https://rdrr.io/github/dongyuanwu/RSBID/man/SMOTE_NC.html). Shortly, I have a dataset with continuous and categorical (binary only) variables in which I…

Aezhel
- 1
- 1
0
votes
1 answer
Using WEKA Filters in Java - Oversampling and Undersampling
I'm having an issue with finding out how to use WEKA filters in the java code. I've looked up help but it seems a little dated as I'm using WEKA 3.8.5 . I'm doing 3 test. Test 1: No Filter, Test 2: weka.filters.supervised.instance.SpreadSubsample…

Damon Green
- 49
- 8
0
votes
1 answer
Pandas oversampling ragged sequential data
Trying to use pandas to oversample my ragged data (data with different lengths).
Given the following data samples:
import pandas as pd
x = pd.DataFrame({'id':[1,1,1,2,2,3,3,3,3,4,5,6,6],'f1':[11,11,11,22,22,33,33,33,33,44,55,66,66]})
y =…

Shlomi Schwartz
- 8,693
- 29
- 109
- 186
0
votes
1 answer
How can I solve the Value error: feature mismatch in XGboost classifier?
for my work, I have split the data and then used oversampling (due to imbalanced distribution) and feature selection. I want to use the classifier XGboost but I get the following error.
ValueError Traceback (most…

ReadyToLearn
- 1
- 3
0
votes
1 answer
R: Error in model.frame.default(formula = class ~ step + type + amount + :) : object is not a matrix
I am new to R and I am trying to play around with the data from here. I try to oversampling it but the Error in model.frame.default happen.
The first trial
oversample_data <- ovun.sample(class ~ ., data = sample_dataset, p = 0.5, seed = 1,…

WILLIAM
- 457
- 5
- 28
0
votes
1 answer
Preferentially Sampling Based upon Value Size
So, this is something I think I'm complicating far too much but it also has some of my other colleagues stumped as well.
I've got a set of areas represented by polygons and I've got a column in the dataframe holding their areas. The distribution of…

jjniev01
- 23
- 3
0
votes
2 answers
Ramdom Oversampling with Stratified KFold - Value Error
I have a data frame that looks like this. The data set is standardized using Standard scaler and dummy variables added for all categorical variables. It is now broken into train and test sets.
amt gender city_pop birth_year …

Poulami Basu
- 23
- 5
0
votes
1 answer
RandomOverSampler doesn't seem to accept log transform as my y target variable
I am trying to to random oversampling over a small dataset for linear regression. However it seems the scikit learn sampling API doesnt work with float values as its target variable. Is there anyway to solve this?
This is a sample of my y_train…

DDM
- 303
- 4
- 19
0
votes
0 answers
Are oversampling and undersampling approaches good to build good models?
I just worked on "Heart Failure Prediction" dataset from kaggle ( https://www.kaggle.com/andrewmvd/heart-failure-clinical-data )
And i noticed the number of "Not dead" were more then the number of "dead" so i used SMOTETomek, which resampled my data…

Jack Froster
- 71
- 7
0
votes
1 answer
How to correct Python Attribute error: 'SMOTE' object has no attribute 'fit_sample'
Hello: I am trying to run the following code:
os = SMOTE(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
columns = X_train.columns
os_data_X,os_data_y=os.fit_sample(X_train, y_train)
But get…

JWeds
- 3
- 1
- 2