2

I have the following dataframe:

    Timestamp   userid      Prices_USD
0   2016-12-01  6.270941895 1.08
1   2016-12-01  6.609813209 1.12
2   2016-12-01  6.632094115 9.70
3   2016-12-01  6.655789772 1.08
4   2016-12-01  6.764640751 9.33
... ... ... ...
1183    2017-03-27  6.529604089 1.08
1184    2017-03-27  6.682639674 6.72
1185    2017-03-27  6.773815105 10.0

I want to calculate, for each unique userid, their monthly spending.

I've tried the following:

sales_per_user.set_index('Timestamp',inplace=True)
sales_per_user.index = pd.to_datetime(sales_per_user.index)
m = sales_per_user.index.month
monthly_avg = sales_per_user.groupby(['userid', m]).Prices_USD.mean().to_frame()

But the resulting dataframe is this:

userid      Timestamp Prices_USD    
3.43964843  12        10.91
3.885813375 1         10.91
            2         10.91
            12        21.82

However, the timestamp column doesn't have the desired outcome. Ideally I would like

userid      Timestamp     Prices_USD    
3.43964843  2016-12       10.91
3.885813375 2017-01       10.91
            2017-02       10.91
            2017-12       21.82

How do I fix that?

Oam
  • 305
  • 5
  • 13

1 Answers1

2

Try:

df['Timestamp'] = pd.to_datetime(df['Timestamp'])
res = df.groupby([df['userid'], df['Timestamp'].dt.to_period('M')])['Prices_USD'].sum()
print(res)

Output

userid    Timestamp
6.270942  2016-12       1.08
6.529604  2017-03       1.08
6.609813  2016-12       1.12
6.632094  2016-12       9.70
6.655790  2016-12       1.08
6.682640  2017-03       6.72
6.764641  2016-12       9.33
6.773815  2017-03      10.00
Name: Prices_USD, dtype: float64
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76