Calulate the groupby (several columns) average in pandas

Question

I have a dataframe as shown below.

Unit_ID    Price           Sector    Contract_Date    Rooms
1          20              SE1       16-10-2015       2
9          40              SE1       20-10-2015       2
2          40              SE1       16-10-2016       3
2          30              SE1       16-10-2015       3
3          20              SE1       16-10-2015       3
3          10              SE1       16-10-2016       3
4          60              SE1       16-10-2016       2
5          40              SE2       16-10-2015       2
8          80              SE1       20-10-2015       2
6          80              SE2       16-10-2016       3
6          60              SE2       16-10-2015       3
7          40              SE2       16-10-2015       3
7          20              SE2       16-10-2015       3
8          120             SE2       16-10-2016       2

From the above I would like to prepare a dataframe as shown below in pandas.

Expected Output:

Sector       Rooms    Year         Average_Price
SE1          2        2015         30
SE1          2        2016         60
SE1          3        2015         25
SE1          3        2016         25
SE2          2        2015         60
SE2          2        2016         120
SE2          3        2015         50
SE2          3        2016         50

I think I should use pandas groupby

I tried following code

df['Year'] = df.Contract_Date.dt.year
df1 = df.groupby(['Sector', 'Year', 'Rooms']).Price.mean()

And does that not work? Is your `Date_Created` actually a date type? What error/results do you get? — Jon Clements, Dec 29 '19 at 11:54
@JonClements I would like to have that in proper table format, not in series. — Danish, Dec 29 '19 at 11:56
i think your code works fine, except that you have missed index 8,--> `80 SE1 20-10-2015 2` for example in your manual calculation, you are taking just the first 2 rows for `room 2` `year 2015` and sector `SE1` — anky, Dec 29 '19 at 11:56
Maybe you need to add `reset_index()` like this: `df.groupby(['Sector', 'Year', 'Rooms']).Price.mean().reset_index()` — E. Zeytinci, Dec 29 '19 at 11:56
or : `df.groupby(['Sector', 'Year', 'Rooms'],as_index=False).Price.mean()` — anky, Dec 29 '19 at 11:58

ansev · Accepted Answer · 2019-12-29T12:04:44.647

2

Use:

( df.groupby(['Sector','Rooms',df['Contract_Date'].dt.year.rename('Year')])
    .Price
    .mean()
    .rename('Average_Price')
    .reset_index() )

  Sector  Rooms  Year  Average_Price
0    SE1      2  2015      46.666667
1    SE1      2  2016      60.000000
2    SE1      3  2015      25.000000
3    SE1      3  2016      25.000000
4    SE2      2  2015      40.000000
5    SE2      2  2016     120.000000
6    SE2      3  2015      40.000000
7    SE2      3  2016      80.000000

or using groupby.agg:

( df.groupby(['Sector','Rooms',df['Contract_Date'].dt.year.rename('Year')])
    .Price
    .agg(Average_Price = 'mean')
    .reset_index() )

edited Dec 29 '19 at 12:04

answered Dec 29 '19 at 11:56

ansev

30,322
5
17
31

Was just about to suggest the `.agg` approach... – Jon Clements Dec 29 '19 at 12:04
(although instead of `.Price.agg(...)` - I might have been tempted to jump straights to the agg, eg: `.agg(Average_Price=('Price', 'mean'))` instead – Jon Clements Dec 29 '19 at 12:09
there are many variants, it is evident. – ansev Dec 29 '19 at 12:20

Calulate the groupby (several columns) average in pandas

1 Answers1