I have a large dataset of product families. I'm trying to catch any weird data entries where a price is too high/low than the other members of the family. For example, I have a this pandas.DataFrame
:
df =
Prices Product Family
0 1.99 Yoplait
1 1.89 Yoplait
2 1.59 Yoplait
3 1.99 Yoplait
4 7.99 Yoplait
5 12.99 Hunts
6 12.99 Hunts
7 2.99 Hunts
8 12.49 Hunts
I want to write a for loop, that iterates through each Product Family, sets some kind of threshold which identifies which products are questionable (row 4 and row 7), and spits out that row. How can I do this?
So far I have this:
families = df['Product Family'].unique()
for i in families:
if df['Prices] .....(set threshold)
then.....(spit out that row that is questionable)
And then I would ideally finish off that if statement in the for loop, for each product family. Does anyone have an idea (or a better one) on how to set this threshold and finish off the code?