0

I have a dataframe as shown below.

Unit_ID    Price           Sector    Contract_Date    Rooms
1          20              SE1       16-10-2015       2
9          40              SE1       20-10-2015       2
2          40              SE1       16-10-2016       3
2          30              SE1       16-10-2015       3
3          20              SE1       16-10-2015       3
3          10              SE1       16-10-2016       3
4          60              SE1       16-10-2016       2
5          40              SE2       16-10-2015       2
8          80              SE1       20-10-2015       2
6          80              SE2       16-10-2016       3
6          60              SE2       16-10-2015       3
7          40              SE2       16-10-2015       3
7          20              SE2       16-10-2015       3
8          120             SE2       16-10-2016       2

From the above I would like to prepare a dataframe as shown below in pandas.

Expected Output:

Sector       Rooms    Year         Average_Price
SE1          2        2015         30
SE1          2        2016         60
SE1          3        2015         25
SE1          3        2016         25
SE2          2        2015         60
SE2          2        2016         120
SE2          3        2015         50
SE2          3        2016         50

I think I should use pandas groupby

I tried following code

df['Year'] = df.Contract_Date.dt.year
df1 = df.groupby(['Sector', 'Year', 'Rooms']).Price.mean()
Danish
  • 2,719
  • 17
  • 32
  • 1
    And does that not work? Is your `Date_Created` actually a date type? What error/results do you get? – Jon Clements Dec 29 '19 at 11:54
  • @JonClements I would like to have that in proper table format, not in series. – Danish Dec 29 '19 at 11:56
  • 1
    i think your code works fine, except that you have missed index 8,--> `80 SE1 20-10-2015 2` for example in your manual calculation, you are taking just the first 2 rows for `room 2` `year 2015` and sector `SE1` – anky Dec 29 '19 at 11:56
  • 1
    Maybe you need to add `reset_index()` like this: `df.groupby(['Sector', 'Year', 'Rooms']).Price.mean().reset_index()` – E. Zeytinci Dec 29 '19 at 11:56
  • 3
    or : `df.groupby(['Sector', 'Year', 'Rooms'],as_index=False).Price.mean()` – anky Dec 29 '19 at 11:58

1 Answers1

2

Use:

( df.groupby(['Sector','Rooms',df['Contract_Date'].dt.year.rename('Year')])
    .Price
    .mean()
    .rename('Average_Price')
    .reset_index() )

  Sector  Rooms  Year  Average_Price
0    SE1      2  2015      46.666667
1    SE1      2  2016      60.000000
2    SE1      3  2015      25.000000
3    SE1      3  2016      25.000000
4    SE2      2  2015      40.000000
5    SE2      2  2016     120.000000
6    SE2      3  2015      40.000000
7    SE2      3  2016      80.000000

or using groupby.agg:

( df.groupby(['Sector','Rooms',df['Contract_Date'].dt.year.rename('Year')])
    .Price
    .agg(Average_Price = 'mean')
    .reset_index() )
ansev
  • 30,322
  • 5
  • 17
  • 31