Questions tagged [oversampling]

Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented).

156 questions
0
votes
1 answer

Oversampled train set and test set - machine learning classification

Let's say that I have oversampled my training set after splitting, then I selected the features of interest to be extracted based on the training set analysis. After this, do I use the oversampled training set with the testing set together to…
0
votes
0 answers

i want to apply oversampling to the minority classes.but it displayes an error code

I have a datasets which is imbalanced between normal and abnormal ultrasound liver images.I want to balance the datasets using imageDatagenerator packages but it displays an error TypeError: init() got an unexpected keyword argument 'oversample' …
hume
  • 1
  • 1
0
votes
0 answers

PyAudio Oversampling on Windows 10

I have written a program in Python which reads a stream of data from an external sound card. From my mac computer I can connect to the sound card at 44.1, 48, 96, 192 & 384 kHz without any issues, however on the Windows platform it is only possible…
0
votes
0 answers

How do inputs to UBL::SmoteClassif() influence vectors lengths passed to Fortran?

I'm using UBL::SmoteClassif() function in R to over-sample minority classes to create a more balanced dataset. I have 8 classes. I had a dataset with 357,038 rows and 147 columns/covariates and it works. I have another dataset with 186,274 rows and…
Kevin
  • 229
  • 3
  • 9
0
votes
0 answers

Can't oversample my image data using SMOTE

I'm new to machine learning, and i have been working on a project for early dementia detection using cnn. I am facing issue in oversampling my data.(data is MRI images from imported from kaggle with train and test classes having 4 sub…
0
votes
0 answers

Process oversampled distribution data to normal distribution data?

First, sorry for my poor English skills. I have data to preprocess, and some columns' distributions are like below figure. My first opinion is down sample those peak area so that data can have normal-like distribution. my question is, if…
hjsg1010
  • 165
  • 3
  • 13
0
votes
0 answers

How to define "sampling_strategy" in SMOTE and RandomUnderSampling for Multiclass Classification problem?

I am solving a multiclass classification problem using LinearSVC() where each class has the following samples (training data) Counter({7: 4799, 6: 4713, 4: 4448, 3: 419, 2: 405, 5: 324, 0: 214, 1: 64}) I tried both oversampling using SMOTE and…
0
votes
0 answers

SMOTE runs forever with no result (small dataset)

I'm trying to use SMOTE or SMONEENN on my dataset containing only 2000 rows. My set up is the following: sklearn: '1.1.3' imblearn: '0.9.1' python: '3.10.8' when running smt = SMOTEENN(random_state=42, n_jobs = -1, ) and X_train_SMOTE, y_train_SMOTE…
0
votes
0 answers

Oversampling method (SMOTE, ADASYN, and borderline SMOTE)

mates. I was supposed to try to deal this data after oversampling, so I chose SMOTE, ADASYN and borderlinesmote to figure out which sampling method is the best. but the thing is .. when I applied those three sampler, seems they are creating exact…
Nini
  • 25
  • 3
0
votes
1 answer

Combination of CalibratedClassifierCV with RandomOverSampler

When using a classifier like GaussianNB(), the resulting .predict_proba() values are sometimes poorly calibrated; that's why I'd like to wrap this classifier into sklearn's CalibratedClassifierCV. I have now a binary classification problem with only…
0
votes
0 answers

BERT - The truth value of a DataFrame is ambiguous

I am getting into deep learning for some of my models and I am running into issues. I wanted to get it to work simply without any adjustments in the data, but I got Graph execution error: followed by a bunch of lines like File…
Rob
  • 21
  • 5
0
votes
0 answers

ROSE() in R giving me negative samples when all values in training set are positive integers

I am oversampling my training dataset using ROSE() in R as below, and the oversampled dataset contains several negative values for columns that are meant to be strictly positive. The original training data is also positive, so I am surprised that…
DV24
  • 1
0
votes
0 answers

Different training score but same test score when using pipeline

I have a problem that produce different training score when using pipeline and manual. MANUAL : #standardize data sc=StandardScaler() X_train[['age','balance','duration']] =…
new_data
  • 11
  • 2
0
votes
0 answers

Fit the model using entire data or from training data?

I am given two data. Firstly, the train data with known class (target) Secondly, the test data with no class (no target) I split the training data into train set and validation set . I oversample the train data and test it on my validation set. It…
0
votes
0 answers

Oversample using SMOTE on certain features

My dataset have 5 features. 3 out of 5 are categorical and very imbalanced. First question, should I split data into train and test set before applying SMOTE? Second question, if yes to Q1, how do I apply SMOTE to only 3 features while leaving the…