0

If I want to remove values that do not exist between -2σ and 2σ, how do I remove outliers using iqr?

I implemented this equation as follows.

iqr = df['abc'].percentile(0.75) - df['abc'].percentile(0.25)

cond1 = (df['abc'] > df['abc'].percentile(0.75) + 2 * iqr)
cond2 = (df['abc'] < df['abc'].percentile(0.25) - 2 * iqr)

df[cond1 & cond2]

Is this the right way?

SecY
  • 307
  • 4
  • 12
  • 1
    If you're getting the output you expect, then it's right. If not, provide a sample of your `df`, along with your expected output – not_speshal Apr 25 '22 at 13:47

1 Answers1

1

This is not right. Your iqr is almost never equal to σ. Quartiles and deviations are not the same things.

Fortunately, you can easily compute the standard deviation of a numerical Series using Series.std().

sigma = df['abc'].std()

cond1 = (df['abc'] > df['abc'].mean() - 2 * sigma)
cond2 = (df['abc'] < df['abc'].mean() + 2 * sigma)

df[cond1 & cond2]
Benjamin Rio
  • 652
  • 2
  • 17