0

I would like to filter outliers by categories. For each column (fat_100g...) and each category from ['main_category_fr'] i would like to filter with the IQR method

My dataframe df :

enter image description here

I have done this :

nutriments = ["fat_100g", "carbohydrates_100g", "fiber_100g", "proteins_100g", "salt_100g", "sodium_100g","nutrition_score","sugars_100g","saturated-fat_100g"]

for var in nutriments:
    IQR = round(df[var].quantile(0.75) - df[var].quantile(0.25), 1)
    limite_haute = round(df[var].quantile(0.75) +(1.5 * IQR),1)
    df = df.loc[(df[var].isnull()) | (df[var] <=limite_haute)]

But i don't know how to use it for each category from ['main_category_fr'] in a loop

Giordano
  • 37
  • 6

1 Answers1

0

Following our discussion, you can use as starting point the code below.

What you need is to filter out all rows where all nutriments are not in their own interval defined by iqr

iqr = df[nutriments].apply(np.quantile, q=[0.25, 0.75])

out = df[((iqr.iloc[0] >= df[nutriments])
         & (df[nutriments] <= iqr.iloc[1])).all(axis=1)]
Corralien
  • 109,409
  • 8
  • 28
  • 52