Highest Voted 'oversampling' Questions

0

votes

0 answers

Generating Artificial data from real data

I have a dataframe consisting 2000 rows and 5 features (columns) as follows: my_data: Id, f1, f2, f3, f4(target_value) u1 34 sd 43 1 u1 30 fd 3 0 u1 01 …

asked May 18 '19 at 10:38

Spedo

355
3
13

0

votes

1 answer

Function for cross validation and oversampling (SMOTE)

I wrote the below code. X is a dataframe with the shape (1000,5) and y is a dataframe with shape (1000,1). y is the target data to predict, and it is imbalanced. I want to apply cross validation and SMOTE. def Learning(n, est, X, y): s_k_fold =…

python cross-validation oversampling

asked May 15 '19 at 12:32

BTurkeli

91
1
2
15

0

votes

1 answer

ML with imbalanced binary dataset

I have a problem I am trying to solve: - imbalanced dataset with 2 classes - one class dwarfs the other one (923 vs 38) - f1_macro score when the dataset is used as-is to train RandomForestClassifier stays for TRAIN and TEST in 0.6 - 0.65…

python scikit-learn dataset resampling oversampling

asked May 10 '19 at 01:43

Greem666

919
13
24

0

votes

1 answer

Error in Rose sampling when balancing data with categorical variables

I'm trying to balance my data in which the majority class has proportion of 99% while the rare class has 1%. My response variable is binary and my independent variables are both binary, integer and categorical variables. I'm using ROSE function of…

r oversampling

asked Apr 27 '19 at 09:57

Cigdem

1
2

0

votes

0 answers

Does multicore processing in IPython kernel Jupyter Notebook really speedup execution time?

I'm running a dataset oversampling code on a python3 jupyter notebook:- Snippet sm = SVMSMOTE(random_state=42) X_res, Y_res = sm.fit_resample(X,Y) but this is taking too long to execute. When I checked the system monitor, it showed that only one…

python jupyter-notebook multicore execution-time oversampling

asked Mar 28 '19 at 16:57

cappy0704

557
2
9
30

0

votes

1 answer

What is the best way to oversample a dataframe preserving its statistical properties in Python 3?

I have the following toy df: FilterSystemO2Concentration (Percentage) ProcessChamberHumidityAbsolute (g/m3) ProcessChamberPressure (mbar) 0 0.156 1 29.5 …

python python-3.x dataframe resampling oversampling

asked Feb 13 '19 at 11:31

Miguel 2488

1,410
1
20
41

0

votes

1 answer

SMOTE Algorithm and Classification: overrated prediction success

I'm facing a problem about which I can't find any answer. I have a binary classification problem (output Y=0 or Y=1) with Y=1 the minority class (actually Y=1 indicates default of a company, with proportion=0.02 in the original…

r machine-learning cross-validation oversampling

asked Nov 06 '18 at 13:47

T. Ciffréo

126
10

0

votes

0 answers

Error in Oversampling example in R

I am runing below code for oversampling in R varNames1 = paste0("Quote.Type","+","Quote.State","+","Forecast.Type","+","Suggested.Reseller.Discount","+","Territory","+","Pricing.Type") ctrl <- trainControl(method = "repeatedcv", …

r random-forest oversampling

asked Jul 30 '18 at 10:20

Jaivinder Negi

1
1

0

votes

0 answers

could not balanced large dataset

i tried various techniques such as oversampling, undersampling, ROSE and both(oversampling and undersampling) on a imbalanced dataset to balance a dataset. when i applied all these techniques on a small dataset then these techniques perfectly…

r oversampling

asked Jul 16 '18 at 13:31

maira khan

43
1
8

0

votes

0 answers

Any reason not to use oversampling and undersampling together?

This has been bothering me for quiet some time. If oversampling and undersampling both have their pros and cons, why not use them together to minimize their weaknesses? I just couldn't find a paper or an article that says they've used both or we…

machine-learning oversampling

asked Jan 29 '18 at 01:19

user8397275

131
1
8

0

votes

0 answers

oversampling doesn't generate new samples

My dataset has the following distribution: class frequency 0 960 1 2093 2 22696 3 1116 4 2541 5 1298 6 14 I am using python-imblearn to oversample the minority class. With regular smote I am…

python-2.7 machine-learning scikit-learn oversampling imblearn

asked Jan 24 '18 at 07:04

Pratik Kumar

2,211
1
17
41

0

votes

2 answers

what is the differene between Stratify and StratifiedKFold in python scikit learn?

My data consists of 99% target variable = 1, and 1% target variable = '0'. Does stratify guarantee that the train tests and test sets have equal ratio of data in terms of target variable. As in containts, equal amounts of '1' and '0'? Please see…

python machine-learning scikit-learn oversampling

asked Jan 23 '18 at 13:44

user9238790

-1

votes

1 answer

How to get indices of created samples in Imblearn

I am using different imblearn over-sampling methods on a data-set which contains ~55800 samples. About 200 are class 1, the rest class 0. I am oversampling class 1 with various over-sampling-strategies. It does not improve my model quality and…

machine-learning oversampling imblearn

asked Apr 07 '20 at 07:51

Andreas bleYel

463
2
5
7

-1

votes

2 answers

Create row of most frequent value in each dataframe column

CONTEXT I want to create a top row with the most frequent values of each column. CURRENT CODE df = df.loc[df['Gender'] == 'M'] df = df('Gender').count() DATA SAMPLE Gender Eyes Hair Height M Brown Brown >6ft M …

python pandas dataframe pandas-groupby oversampling

asked Feb 18 '20 at 02:40

KL_

293
6
22

-1

votes

1 answer

What is correct way of sampling a highly imbalanced dataset which has low between feature correlation and low between class variance?

I have a dataset with 23 features with very low correlation. The two classes have low variance between the classes. The classes are highly imbalanced like that of data available for fraud detection. What is suitable approach for sampling this kind…

statistics data-science sampling oversampling imbalanced-data

asked Jan 07 '20 at 13:41

Aravind M

13
3

Questions tagged [oversampling]