1

I have a Dataframe like the following:

df = pd.DataFrame()
df['datetime'] = pd.date_range(start='2023-1-2', end='2023-1-29', freq='15min')
df['week'] = df['datetime'].apply(lambda x: int(x.isocalendar()[1]))
df['day_of_week'] = df['datetime'].dt.weekday
df['hour'] = df['datetime'].dt.hour
df['minutes'] = pd.DatetimeIndex(df['datetime']).minute
df['value'] = range(len(df))
df.set_index('datetime',inplace=True)


  df =                  week day_of_week hour minutes value
    datetime                    
    2023-01-02 00:00:00 1   0   0   0   0
    2023-01-02 00:15:00 1   0   0   15  1
    2023-01-02 00:30:00 1   0   0   30  2
    2023-01-02 00:45:00 1   0   0   45  3
    2023-01-02 01:00:00 1   0   1   0   4
    ... ... ... ... ... ...
    2023-01-08 23:00:00 1   6   23  0   668
    2023-01-08 23:15:00 1   6   23  15  669
    2023-01-08 23:30:00 1   6   23  30  670
    2023-01-08 23:45:00 1   6   23  45  671
    2023-01-09 00:00:00 2   0   0   0   672

And I want to calculate the average of the column "value" for the same hour/minute/day, every two consecutive weeks.

What I would like to get is the following:

df=
                                                    value
    day_of_week hour minutes    datetime    
              0    0       0    2023-01-02 00:00:00 NaN
                                2023-01-09 00:00:00 NaN
                                2023-01-16 00:00:00 336
                                2023-01-23 00:00:00 1008
                           15   2023-01-02 00:15:00 NaN
                                2023-01-09 00:15:00 NaN
                                2023-01-16 00:15:00 337
                                2023-01-23 00:15:00 1009

So the first two weeks should have NaN values and week-3 should be the average of week-1 and week-2 and then week-4 the average of week-2 and week-3 and so on.

I tried the following code but it does not seem to do what I expect:

df = pd.DataFrame(df.groupby(['day_of_week','hour','minutes'])['value'].rolling(window='14D', min_periods=1).mean())

As what I am getting is:

                                                value
day_of_week hour minutes.  datetime 
0           0    0         2023-01-02 00:00:00  0
                           2023-01-09 00:00:00  336
                           2023-01-16 00:00:00  1008
                           2023-01-23 00:00:00  1680
                 15        2023-01-02 00:15:00  1
                           2023-01-09 00:15:00  337
                           2023-01-16 00:15:00  1009
                           2023-01-23 00:15:00  1681
Pythoneer
  • 143
  • 8
  • That should do what you want, except for the possible `shift()` at the end. – Quang Hoang Feb 09 '23 at 19:36
  • @QuangHoang I removed the `shift()` but the solution I get it is not what I want. I have updated the description of the post with the result. – Pythoneer Feb 09 '23 at 20:49

1 Answers1

1

I think you want to shift within each group. Then you need another groupby:

(df.groupby(['day_of_week','hour','minutes'])['value']
   .rolling(window='14D', min_periods=2).mean()         # `min_periods` is different
   .groupby(['day_of_week','hour','minutes']).shift()   # shift within each group
   .to_frame()
)

Output:

                                               value
day_of_week hour minutes datetime                   
0           0    0       2023-01-02 00:00:00     NaN
                         2023-01-09 00:00:00     NaN
                         2023-01-16 00:00:00   336.0
                         2023-01-23 00:00:00  1008.0
                 15      2023-01-02 00:15:00     NaN
...                                              ...
6           23   30      2023-01-15 23:30:00     NaN
                         2023-01-22 23:30:00  1006.0
                 45      2023-01-08 23:45:00     NaN
                         2023-01-15 23:45:00     NaN
                         2023-01-22 23:45:00  1007.0
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • The reason what you put `min_periods=2` is in order for the first two entries to appear with a `NaN` value? – Pythoneer Feb 10 '23 at 18:46
  • @Pythoneer actually for the 2nd NaN value, the first NaN value comes from `shift`. Without it, you would see `0`, `1` as in your output. – Quang Hoang Feb 10 '23 at 18:48