0

I'm trying to identify outliers in each housing type category, but encountering an issue. Whenever I run the code, I receive the following error: "IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

grouped = df.groupby('Type')
q1 = grouped["price"].quantile(0.25)
q3 = grouped["price"].quantile(0.75)
iqr = q3 - q1

upper_bound = q3 + (1.5 * iqr)
lower_bound = q1 - (1.5 * iqr)

outliers = df[(df["price"].reset_index(drop=True) > upper_bound[df["Type"]].reset_index(drop=True)) | (df["price"].reset_index(drop=True) < lower_bound[df["Type"].reset_index(drop=True)])]
print(outliers)

When I run this part of the code

(df["price"].reset_index(drop=True) > upper_bound[df["Type"]].reset_index(drop=True)).reset_index(drop = True)

I'm getting boolean Series, but when I put it in the df[] it breaks.

2 Answers2

1

Use transform to compute q1/q3, this will maintain the original index:

q1 = grouped["price"].transform(lambda x: x.quantile(0.25))
q3 = grouped["price"].transform(lambda x: x.quantile(0.75))

iqr = q3 - q1

upper_bound = q3 + (1.5 * iqr)
lower_bound = q1 - (1.5 * iqr)

outliers = df[df["price"].gt(upper_bound) | df["price"].lt(lower_bound)]
mozway
  • 194,879
  • 13
  • 39
  • 75
0

Use Series.map, then reset_index is not necessary:

outliers = df[(df["price"] > df["Type"].map(upper_bound)) | 
              (df["price"] < df["Type"].map(lower_bound))]
print(outliers)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252