Fixing IndexingError to clean the data

Question

I'm trying to identify outliers in each housing type category, but encountering an issue. Whenever I run the code, I receive the following error: "IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

grouped = df.groupby('Type')
q1 = grouped["price"].quantile(0.25)
q3 = grouped["price"].quantile(0.75)
iqr = q3 - q1

upper_bound = q3 + (1.5 * iqr)
lower_bound = q1 - (1.5 * iqr)

outliers = df[(df["price"].reset_index(drop=True) > upper_bound[df["Type"]].reset_index(drop=True)) | (df["price"].reset_index(drop=True) < lower_bound[df["Type"].reset_index(drop=True)])]
print(outliers)

When I run this part of the code

(df["price"].reset_index(drop=True) > upper_bound[df["Type"]].reset_index(drop=True)).reset_index(drop = True)

I'm getting boolean Series, but when I put it in the df[] it breaks.

score 1 · Accepted Answer · answered Feb 17 '23 at 06:35

Use transform to compute q1/q3, this will maintain the original index:

q1 = grouped["price"].transform(lambda x: x.quantile(0.25))
q3 = grouped["price"].transform(lambda x: x.quantile(0.75))

iqr = q3 - q1

upper_bound = q3 + (1.5 * iqr)
lower_bound = q1 - (1.5 * iqr)

outliers = df[df["price"].gt(upper_bound) | df["price"].lt(lower_bound)]

score 0 · Answer 2 · answered Feb 17 '23 at 06:34

0

Use Series.map, then reset_index is not necessary:

outliers = df[(df["price"] > df["Type"].map(upper_bound)) | 
              (df["price"] < df["Type"].map(lower_bound))]
print(outliers)

answered Feb 17 '23 at 06:34

jezrael

822,522
95
1,334
1,252

Fixing IndexingError to clean the data

2 Answers2