Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented).
Questions tagged [oversampling]
156 questions
0
votes
1 answer
Creating R's formula using Python
I am writing a program that interacts with R using Python. Basically, I have some R libraries that I want to ingest into my Python code. After downloading rpy2, I define my R functions that I want to use in a separate .R file script.
The R function…

Perl Del Rey
- 959
- 1
- 11
- 25
0
votes
1 answer
Undersampling with image data in python
main idea of undersampling is randomly delete the class which has sufficient observations so that the comparative ratio of two classes is significant in our data.
So, how to undersampling with image data in python? please help me:(
I took the…

hilyap
- 1
- 2
0
votes
2 answers
SMOTE-NC in R. No packages found
I have a dataset with 5 nominal and 37 categorical variables. I want to perform oversampling in R. However, with SMOTE, I cannot do so. I looked for SMOTE-NC as advised by (Chawla, Bowyer and Hall, 2002), but I could not find any package supporting…

Kambiz Rakhshan
- 1
- 1
0
votes
1 answer
Upsampling tweets using SMOTE
I have an imbalanced dataset of tweets labeled as -1, 0, +1.
I wanna balance the numbers by upsampling. I receive the following error:
tweet_train=tweet_train.reshape(-1, 1)
X_train_upsample, y_train_upsample =…

Vahid the Great
- 393
- 5
- 18
0
votes
1 answer
Problem with Over- and Under-Sampling with ROSE in R
I have a dataset to classify between won cases (14399) and lost cases (8677). The dataset has 912 predicting variables.
I am trying to oversample the lost cases in order to reach almost the same number as the won cases (so having 14399 cases for…
user8383689
0
votes
1 answer
Upsampling with 64 hz in R
I have data in the below format. Sample data pasted here. It basically has 3 variables Start time, End time & set of values between these timestamps. The sampling rate is 64Hz
Now I need output in the following format with difference between two…

Coolsun
- 189
- 9
0
votes
0 answers
ValueError: could not convert string to float SMOTE fit_sample Python Oversampling
I have a credit risk analysis dataset which goes like this:
Loan_ID Age Income(LPA) Employed_yr Education Loan_status
1 18 2.4 1 12th 1
2 46 43 26 …

noob
- 3,601
- 6
- 27
- 73
0
votes
0 answers
Is there a more efficient way to oversample data than random.sample()?
I got a big unbalanced classification problem and want to address this issue by oversampling the minor classes. (N(class 1) = 8,5mio, N(class n) = 3000)
For that purpose I want to get 100.000 sample for each of the n classes by
data_oversampled =…

Quastiat
- 1,164
- 1
- 18
- 37
0
votes
1 answer
Confused for the Code for over-sampling with R
The code below is about oversampling houses with over 10 rooms, may I ask what does prob = ifelse(housing.df$ROOMS>10, 0.9, 0.01) mean? Thanks a lot.
s <- sample(row.names(housing.df), 5, pro = ifelse(housing.df$ROOMS>10, 0.9, 0.01))
housing.df[s.]

Lea DM
- 1
0
votes
0 answers
Create RandomForest training without splitting the data. I have training data in one file and test data in another file
I want to try using the random forest classifier in python without using train_test_split. I have a training dataset in one file and I want to train the python machine learning model using the training dataset and then I want to apply the model on…

NikhilR
- 1
- 2
0
votes
1 answer
Retrieve the indices for only the resampled instances after oversampling using imbalanced-learn?
For a binary text classification problem with imbalanced data, I use imbalanced-learn library's function RandomOverSampler to balance the classes.
Now, I want to retrieve only the instances that were oversampled (replicated) from the original data.…

PinkBanter
- 1,686
- 5
- 17
- 38
0
votes
1 answer
SMOTE in python
I am trying to use SMOTE in python and looking if there is any way to manually specify the number of minority samples.
Suppose we have 100 records of one class and 10 records of another class if we use ratio = 1 we get 100:100, if we use ratio 1/2,…

Sindhura Bonthu
- 13
- 5
0
votes
1 answer
Oversampling with Leave One Out Cross Validation
I am working with an extremely unbalanced dataset with a total of 44 samples for my research project. It is a binary classification problem with 3/44 samples of the minority class for which I am using Leave One Out Cross Validation. If I perform…

varshika03
- 71
- 3
0
votes
0 answers
Using SMOTE on training data
I have an unbalanced dataset and I want to use SMOTE. I am working with Azure ML. I have read many examples in the Microsoft Doku page. I am wondering why the SMOTE is set before the SPLIT DATA function and not after the SPLIT DATA on the 70%…

Mutatos
- 1,675
- 4
- 25
- 55
0
votes
1 answer
Unbalanced dataset resulting in high false positives after using SMOTE
I am working on a binary classification imbalanced marketing dataset which has:
No:Yes ratio of 88:12 (No-didn't buy the product, yes-bought)
~4300 observations and 30 features (9 numeric and 21 categorical)
I divided my data into train (80%) &…

Vikrant Arora
- 188
- 9