Filter outliers from Pandas dataframe from all columns except one

Question

Say I have a dataframe with features and labels:

f1    f2   label
-1000 -100 1
-5    3    2
0     4    3
1     5    1
3     6    1
1000  100  2

I want to filter outliers from columns f1 and f2 to get:

f1    f2   label
-5    3    2
0     4    3
1     5    1
3     6    1

I know that I can do something like this:

data = data[(data > data.quantile(.05)) & ( data < data.quantile(.95))]

But 'label' column will also be filtered. How can I avoid filtering some column? I don't want to filter all columns manually because there are dozens of them. Thanks.

score 2 · Accepted Answer · answered Jan 09 '17 at 23:15

2

what about the following approach:

In [306]: x = data.drop('label', 1)

In [307]: x.columns
Out[307]: Index(['f1', 'f2'], dtype='object')

In [308]: data[((x > x.quantile(.05)) & (x < x.quantile(.95))).all(1)]
Out[308]:
   f1  f2  label
1  -5   3      2
2   0   4      3
3   1   5      1
4   3   6      1

answered Jan 09 '17 at 23:15

MaxU - stand with Ukraine

205,989
36
386
419

1

btw you might find these useful [cumcount within groups](http://stackoverflow.com/a/41558148/2336654) and [cummax within groups](http://stackoverflow.com/a/41526917/2336654) – piRSquared Jan 09 '17 at 23:31

Filter outliers from Pandas dataframe from all columns except one

1 Answers1