Groupby using conditions from another column

Question

here's a pretty basic question, but my brain is giving up on me and I would really appreciate some help.

I have a dataset with 10000 rows. I have an area name column with 100 unique area names. I have a type column with types ranging from 1 to 10. And I have a spend column.

I would like to group it by area name, and add a new column with an average spend per name (or even in the old spend column).

However: I only want the average of the types from 1-7. So I want to exclude any types 8, 9 or 10 that are in that are in the area. Except, if an area contains only types 8, 9 or 10. In that case, I want the average of that spend.

What I've played with, but haven't managed to actually do it:

Approach 1: Create 2 datasets, one with types1-7, another where there's only types 8, 9 or 10 in an area:

main=['1.','2.', '3.','4.', '5.', '6.', '7.']
eight_to_ten=['8.', '9.', '10.']

df_main = df[df['Type'].isin(main)]
df_main['avg_sales'] = df_main.groupby(['Area Name'])['Sales'].mean()

Approach 2:

df_new['avg_sales'] = df[df['Type'].isin(main)].groupby('Area Name')['Sales'].mean()

I assume there is a really short way of doing this, most likely without having to split the dataset into 2 and then concat it back.

Is it easier to do it with a for loop?

Any help would be appreciated

jezrael · Answer 1 · 2020-02-06T14:35:54.093

0

I believe you need filter first rows by lists and if need new column per groups use GroupBy.transform:

m1 = df['Type'].isin(main)
m2 = df['Type'].isin(eight_to_ten)
df = df_main[m1 | m2].copy()
df['avg_sales'] = df.groupby(['Area Name', m1])['Sales'].transform('mean')

Or for new DataFrame with aggregation add new array for distinguish groups:

arr = np.where(m1, 'first','second')
df1 = df.groupby(['Area Name', arr])['Sales'].mean().reset_index()

edited Feb 06 '20 at 14:35

answered Feb 06 '20 at 14:30

jezrael

822,522
95
1,334
1,252

Thanks! That's close but not quite - this dataset returns 9500 rows - looks like only the ones with types 8-10, which is a majority. I would need the final dataset to have 100 rows - so all unique area names with an average spend per that area, based on the type of store. – lala345 Feb 06 '20 at 14:54
@lala345 - Do you need [this](https://stackoverflow.com/questions/36390406/sample-each-group-after-pandas-groupby) ? – jezrael Feb 06 '20 at 14:58

Groupby using conditions from another column

1 Answers1