2

I am trying to calculate the rolling average of 4 days on the below dataset. The result should also be calculated based on group of 2 other columns.

For example:

df_time = pd.DataFrame({'A': [123, 123, 278, 278, 278, 123, 345, 278, 123,278, 278],
                        'B': [1, 2, 2, 2, 2, 1, 3, 2, 1, 2, 2],
                        'C': [0, 4, 2, 0, 4, 5, 3, 2, 1, 3, 2],
                       'D' : [pd.Timestamp('20130101'),
                                pd.Timestamp('20130102'),
                                pd.Timestamp('20130101'),
                                pd.Timestamp('20130102'),
                                pd.Timestamp('20130103'),
                                pd.Timestamp('20130103'),
                                pd.Timestamp('20130104'),
                                pd.Timestamp('20130104'),
                                pd.Timestamp('20130105'),
                                pd.Timestamp('20130106'),
                                pd.Timestamp('20130109')],
                       'rol_avg': [0, 2, 2, 1, 2, 5, 3, 3, 3, 2, 2.5] } )

df_time.sort_values(by='D')

sum_df = df_time.groupby(by=['A', 'B', 'D'])['C'].sum()

rolling_average_series = sum_df.groupby(level=[0, 1]).rolling(4, min_periods=1).mean()

returns

enter image description here

It is finding the rolling average of the past 4 occurrences of columns A and B. But what I need is rolling average of 4 days with respect to the Date column 'D' and grouped by column 'A' and 'B'

Here, for A=278 and B=2 for 09-01-2013 we have only 06-01-2013 which falls in 4days rolling average so the average should be (2+3)/2 = 2.5 How do I implement this functionality? Any help is greatly appreciated

TheDS
  • 101
  • 2
  • 11

1 Answers1

2

I hope I've understood your question well. You can specify column with on= in .rolling() and then specify window="4D" - that you want 4 days rolling window:

df_time = df_time.sort_values(by="D")
sum_df = df_time.groupby(by=["A", "B", "D"], as_index=False)["C"].sum()

rolling_average_series = (
    sum_df.groupby(["A", "B"])
    .rolling(on="D", window="4D", min_periods=1)
    .mean()
)

print(rolling_average_series)

Prints:

                  D    C
A   B                   
123 1 0  2013-01-01  0.0
      1  2013-01-03  2.5
      2  2013-01-05  3.0
    2 3  2013-01-02  4.0
278 2 4  2013-01-01  2.0
      5  2013-01-02  1.0
      6  2013-01-03  2.0
      7  2013-01-04  2.0
      8  2013-01-06  3.0
      9  2013-01-09  2.5
345 3 10 2013-01-04  3.0
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91