Droping rows with outliers from specific columns

Question

I am building a binary classification model on a heavily unbalanced dataset(95% 1s and 5% 0s). I want to drop the rows with outliers and I used the below code:

from scipy import stats
df=df[(np.abs(stats.zscore(df)) < 3).all(axis=1)]

However, this code is dropping the rows that have my label 0. Is there a better way of dropping rows with outliers for all columns except the label column?

score 2 · Accepted Answer · answered Nov 28 '20 at 20:51

2

Try this (assume your label is located in df["label"]):

df = df[(df["label"] == 0) | (np.abs(stats.zscore(df)) < 3).all(axis=1)]

The first condition will keep all rows with df["label"] == 0 disregard of the zscore.

answered Nov 28 '20 at 20:51

Bill Huang

4,491
2
13
31

Droping rows with outliers from specific columns

1 Answers1