Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented).
Questions tagged [oversampling]
156 questions
0
votes
1 answer
Oversampling of image data for keras
I am working on Kaggle competition and trying to solve a multilabel classification problem with keras.
My dataset is highly imbalanced. I am familiar with this concept and did it for simple machine learning datasets, but now sure how to deal with…

Anakin Skywalker
- 2,400
- 5
- 35
- 63
0
votes
0 answers
issue with high recall, low precision after oversampling
I have a classification problem with class imbalance and after the oversampling, I get high recall ,accuracy and roc (around 0.85) while my precision and f1 is fairly low(0.50). I have used every kind of smote and undersammpling but I never get to…

hippocampus
- 347
- 4
- 9
0
votes
0 answers
Interpolating lines of a Polygon
Let's suppose we have 5 (x,y) points which makes a closed loop or a polygon. How can I interpolate or upsample some points so that the polygon will have a more round-ish look instead of sharp linear lines between two points? E.g. see the image. What…

mbehzad
- 1
0
votes
1 answer
Defore oversampling data showing 0
I am working on my dataset and quite new to this. Below is the code:
class_col_name='Creditability'
feature_names=df.columns[df.columns != class_col_name ]
# 70% training and 30% test
X_train, X_test, y_train, y_test = train_test_split(df.loc[:,…

AHF
- 1,070
- 2
- 15
- 47
0
votes
3 answers
How to keep/extend index when oversample
I've got a dataframe like that , and I want to oversample the column "role" (in a real case the number of rows/columns in much bigger than this minimal example)
role value
pop_13vdpn1_site_1 1 1
pop_13vdpn1_site_1 1 …

psagrera
- 141
- 1
- 9
0
votes
1 answer
Oversampling the dataset with pytorch
I'm quite new to PyTorch and python. and I have a binary classification problem where one class have more samples than the other, so I decided to oversample the class that has less number of samples by doing more augmentation on it, so for example I…
0
votes
1 answer
getting error while performing oversample operation in r
I am working on a machine learning model(classification) where my dataset is imbalanced and i want to balance it by using oversample() function from 'imbalance' package in R.
Below are the codes used for oversampling where 'Final.Status' is my…

Nick
- 333
- 5
- 17
0
votes
0 answers
Duplicating samples of time series
I have a highly imbalanced dataset:
from collections import Counter
unique1, counts1 = np.unique(labels_ds , return_counts=True)
dict(zip(unique1, counts1))
print('Original dataset shape {}' .format(counts1))
#returns
#Original dataset shape…

Rim Sleimi
- 13
- 4
0
votes
0 answers
Over sampling with only nominal features, which over or undersampling techniques could be valid in this case?
I have data where all features are nominal. I applied SMOTE-NC, then I found that it only works with a combination of nominal and continuous features!.
There is a technique called SMOTE-N (to deal with only nominal features) in the same paper of…

Hanan
- 33
- 1
- 5
0
votes
1 answer
pyspark oversample classes by every target variable
I wanted to know if there is any way to oversample the data using pyspark.
I have dataset with target variable of 10 classes. As of Now I am taking each class and oversampling like below to…

Naveen Srikanth
- 739
- 3
- 11
- 23
0
votes
0 answers
Imbalance dataset: Should I use oversampling technique before or after feature selection?
I wonder if it is best to use oversampling techniques such as ADASYN before feature selection steps or after. Thanks
0
votes
0 answers
scikit-learn and imblearn: does GridSearchCV/RandomSearchCV apply preprocessing to the validation set as well?
I'm currently using sklearn for a school project and I have some questions about how GridsearchCV applies preprocessing algorithms such as PCA or Factor Analysis. Let's suppose I perform hold out:
X_tr, X_ts, y_tr, y_ts = train_test_split(X, y,…

Asduffo
- 93
- 1
- 6
0
votes
1 answer
SMOTE oversampling creates new data-points
I am trying to solve an imbalanced classification problem, all the input features are categorical.
Here are the value counts of each feature:
for i in X_train.columns:
print(i+':',X_train[i].value_counts().shape[0])
Pclass: 3
Sex: 2
…

Nitin Kumar
- 77
- 6
0
votes
1 answer
I am trying to use ROSE to help sampling imbalance. My ovun.sample code is creating empty values, how can I fix this?
I am trying to use ROSE to help with an imbalanced dataset. I am about 90% there, but I am having trouble with my ovun.sample code. When I run the ovun.sample code, it does not create a "over", "under" or "both" dataset, the values are showing in…

Aether
- 1
0
votes
1 answer
Smote - Select Perc_under and Perc_Over
I am using smote for the 1st time in R
I am using smote on train data having majority class which is 0 - 7952346 and minority class 1- 27230,
I want to downsample such that I have 1's near to 30000 and 0's near to this range 180000-200000.
I am…

Dexter1611
- 492
- 1
- 4
- 15