Questions tagged [oversampling]

Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented).

156 questions
0
votes
1 answer

Oversampling method using R

I'm studying oversampling method using R. Let's say I want to do oversampling from the data df. df <- data.frame(y=rep(as.factor(c('Yes', 'No')), times=c(90, 10)), x1=rnorm(100), x2=rnorm(100)) Obviously, df has 10…
Lee
  • 369
  • 1
  • 6
0
votes
1 answer

Oversampling on binary classification

everyone. I am doing a binary classification on a huge dataset (190 columns, 500K records). The target values are 0 and 1. However, when I do the oversampling with SMOTE, new target values in the y-vector are created (0, 1, 2 for example). I do not…
0
votes
1 answer

Error 'names' attribute [35563] must be the same length as the vector [1] after running SMOGNRegress in R

I'm trying to oversample an imbalanced dataset with a continuous target variable using SMOGNRegress from the UBL package in R. When I run the code: SMOGNRegress(Deceased~., normalized_data, rel = "auto", thr.rel = 0.9999, C.perc = "balance", k = 2,…
0
votes
1 answer

Eliminate hairlines from a vector graphics by converting to oversampled bitmap and then downscaling - How with ImageMagick?

I used Apple Numbers (a Spreadsheet app with styling options) to create a UX flowchart of various user interfaces of an app. Apple Numbers has a PDF export option. The problem is that even though some border lines in the table have been set to…
porg
  • 1,079
  • 1
  • 11
  • 17
0
votes
0 answers

Multiclass Sampling Strategy

Scenario : Currently I am working on multiclass classification problem. I have 2 million historical dataset of having 180 classes and need to create model which will predict the classes accurately. I have created model using HybridGradientboosting…
0
votes
1 answer

SMOTE_NC function in R: error in the ouput

thank you in advance for your time! I'm having some trouble with the SMOTE_NC function in R (https://rdrr.io/github/dongyuanwu/RSBID/man/SMOTE_NC.html). Shortly, I have a dataset with continuous and categorical (binary only) variables in which I…
0
votes
1 answer

Using WEKA Filters in Java - Oversampling and Undersampling

I'm having an issue with finding out how to use WEKA filters in the java code. I've looked up help but it seems a little dated as I'm using WEKA 3.8.5 . I'm doing 3 test. Test 1: No Filter, Test 2: weka.filters.supervised.instance.SpreadSubsample…
0
votes
1 answer

Pandas oversampling ragged sequential data

Trying to use pandas to oversample my ragged data (data with different lengths). Given the following data samples: import pandas as pd x = pd.DataFrame({'id':[1,1,1,2,2,3,3,3,3,4,5,6,6],'f1':[11,11,11,22,22,33,33,33,33,44,55,66,66]}) y =…
Shlomi Schwartz
  • 8,693
  • 29
  • 109
  • 186
0
votes
1 answer

How can I solve the Value error: feature mismatch in XGboost classifier?

for my work, I have split the data and then used oversampling (due to imbalanced distribution) and feature selection. I want to use the classifier XGboost but I get the following error. ValueError Traceback (most…
0
votes
1 answer

R: Error in model.frame.default(formula = class ~ step + type + amount + :) : object is not a matrix

I am new to R and I am trying to play around with the data from here. I try to oversampling it but the Error in model.frame.default happen. The first trial oversample_data <- ovun.sample(class ~ ., data = sample_dataset, p = 0.5, seed = 1,…
WILLIAM
  • 457
  • 5
  • 28
0
votes
1 answer

Preferentially Sampling Based upon Value Size

So, this is something I think I'm complicating far too much but it also has some of my other colleagues stumped as well. I've got a set of areas represented by polygons and I've got a column in the dataframe holding their areas. The distribution of…
jjniev01
  • 23
  • 3
0
votes
2 answers

Ramdom Oversampling with Stratified KFold - Value Error

I have a data frame that looks like this. The data set is standardized using Standard scaler and dummy variables added for all categorical variables. It is now broken into train and test sets. amt gender city_pop birth_year …
0
votes
1 answer

RandomOverSampler doesn't seem to accept log transform as my y target variable

I am trying to to random oversampling over a small dataset for linear regression. However it seems the scikit learn sampling API doesnt work with float values as its target variable. Is there anyway to solve this? This is a sample of my y_train…
DDM
  • 303
  • 4
  • 19
0
votes
0 answers

Are oversampling and undersampling approaches good to build good models?

I just worked on "Heart Failure Prediction" dataset from kaggle ( https://www.kaggle.com/andrewmvd/heart-failure-clinical-data ) And i noticed the number of "Not dead" were more then the number of "dead" so i used SMOTETomek, which resampled my data…
0
votes
1 answer

How to correct Python Attribute error: 'SMOTE' object has no attribute 'fit_sample'

Hello: I am trying to run the following code: os = SMOTE(random_state=0) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0) columns = X_train.columns os_data_X,os_data_y=os.fit_sample(X_train, y_train) But get…
JWeds
  • 3
  • 1
  • 2