Questions tagged [oversampling]

Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented).

156 questions
0
votes
1 answer

Oversampling of image data for keras

I am working on Kaggle competition and trying to solve a multilabel classification problem with keras. My dataset is highly imbalanced. I am familiar with this concept and did it for simple machine learning datasets, but now sure how to deal with…
Anakin Skywalker
  • 2,400
  • 5
  • 35
  • 63
0
votes
0 answers

issue with high recall, low precision after oversampling

I have a classification problem with class imbalance and after the oversampling, I get high recall ,accuracy and roc (around 0.85) while my precision and f1 is fairly low(0.50). I have used every kind of smote and undersammpling but I never get to…
0
votes
0 answers

Interpolating lines of a Polygon

Let's suppose we have 5 (x,y) points which makes a closed loop or a polygon. How can I interpolate or upsample some points so that the polygon will have a more round-ish look instead of sharp linear lines between two points? E.g. see the image. What…
0
votes
1 answer

Defore oversampling data showing 0

I am working on my dataset and quite new to this. Below is the code: class_col_name='Creditability' feature_names=df.columns[df.columns != class_col_name ] # 70% training and 30% test X_train, X_test, y_train, y_test = train_test_split(df.loc[:,…
AHF
  • 1,070
  • 2
  • 15
  • 47
0
votes
3 answers

How to keep/extend index when oversample

I've got a dataframe like that , and I want to oversample the column "role" (in a real case the number of rows/columns in much bigger than this minimal example) role value pop_13vdpn1_site_1 1 1 pop_13vdpn1_site_1 1 …
psagrera
  • 141
  • 1
  • 9
0
votes
1 answer

Oversampling the dataset with pytorch

I'm quite new to PyTorch and python. and I have a binary classification problem where one class have more samples than the other, so I decided to oversample the class that has less number of samples by doing more augmentation on it, so for example I…
0
votes
1 answer

getting error while performing oversample operation in r

I am working on a machine learning model(classification) where my dataset is imbalanced and i want to balance it by using oversample() function from 'imbalance' package in R. Below are the codes used for oversampling where 'Final.Status' is my…
Nick
  • 333
  • 5
  • 17
0
votes
0 answers

Duplicating samples of time series

I have a highly imbalanced dataset: from collections import Counter unique1, counts1 = np.unique(labels_ds , return_counts=True) dict(zip(unique1, counts1)) print('Original dataset shape {}' .format(counts1)) #returns #Original dataset shape…
0
votes
0 answers

Over sampling with only nominal features, which over or undersampling techniques could be valid in this case?

I have data where all features are nominal. I applied SMOTE-NC, then I found that it only works with a combination of nominal and continuous features!. There is a technique called SMOTE-N (to deal with only nominal features) in the same paper of…
0
votes
1 answer

pyspark oversample classes by every target variable

I wanted to know if there is any way to oversample the data using pyspark. I have dataset with target variable of 10 classes. As of Now I am taking each class and oversampling like below to…
Naveen Srikanth
  • 739
  • 3
  • 11
  • 23
0
votes
0 answers

Imbalance dataset: Should I use oversampling technique before or after feature selection?

I wonder if it is best to use oversampling techniques such as ADASYN before feature selection steps or after. Thanks
0
votes
0 answers

scikit-learn and imblearn: does GridSearchCV/RandomSearchCV apply preprocessing to the validation set as well?

I'm currently using sklearn for a school project and I have some questions about how GridsearchCV applies preprocessing algorithms such as PCA or Factor Analysis. Let's suppose I perform hold out: X_tr, X_ts, y_tr, y_ts = train_test_split(X, y,…
0
votes
1 answer

SMOTE oversampling creates new data-points

I am trying to solve an imbalanced classification problem, all the input features are categorical. Here are the value counts of each feature: for i in X_train.columns: print(i+':',X_train[i].value_counts().shape[0]) Pclass: 3 Sex: 2 …
0
votes
1 answer

I am trying to use ROSE to help sampling imbalance. My ovun.sample code is creating empty values, how can I fix this?

I am trying to use ROSE to help with an imbalanced dataset. I am about 90% there, but I am having trouble with my ovun.sample code. When I run the ovun.sample code, it does not create a "over", "under" or "both" dataset, the values are showing in…
Aether
  • 1
0
votes
1 answer

Smote - Select Perc_under and Perc_Over

I am using smote for the 1st time in R I am using smote on train data having majority class which is 0 - 7952346 and minority class 1- 27230, I want to downsample such that I have 1's near to 30000 and 0's near to this range 180000-200000. I am…
Dexter1611
  • 492
  • 1
  • 4
  • 15