How to find the outlier (40, 10) in this case using IQR rule?

Question

Suppose I need to remove the outlier, that is (40, 10) in this case (refer to the plot attached below) using IQR rule, how do I do that?

Compared to the neighbouring points, (40, 10) is definitely an outlier. However,
Q1 = 11.25,
Q3 = 35.75
1.5 * IQR = 1.5 * (Q3 - Q1) = 36.75
Only points with y-val lower than 11.25-36.75 or greater than 35.75+36.75 are considered outliers.
How do I find and remove (40, 10) using IQR rule if I must use IQR rule?

Here's my code:

import pandas as pd
import matplotlib.pyplot as plt

test = pd.DataFrame({'x': range(50), 'y': [i if i != 40 else 10 for i in range(50)]})

plt.figure(**FIGURE)
plt.scatter(test['x'], test['y'], marker='x')
plt.show()

Here's the plot generated from the above code.

plot

You are using a 1D test for a 2D problem. You could create a regression line and use the distance to the regression line to identify outliers. See e.g. [Can scipy.stats identify and mask obvious outliers?](https://stackoverflow.com/questions/10231206/can-scipy-stats-identify-and-mask-obvious-outliers) — JohanC, Sep 26 '20 at 14:19

score 0 · Accepted Answer · answered Sep 26 '20 at 17:34

The way you are using the IQR is only considering the X axis component. If you do not include the Y axis components, then the point at (40, 10) is not an outlier.

You should use a method that considers 2D instances, such as Local Outlier Factor or any other.

How to find the outlier (40, 10) in this case using IQR rule?

1 Answers1