How to do time series backward resampling e.g. 5 business days starting on the last data date?

Question

I would like to compute weekly returns but starting from the end date backwards. This is my initial attempt to implement it using pandas:

import pandas as pd
import numpy as np
from pandas.tseries.offsets import BDay

index = pd.date_range(start='2020-09-13', end='2020-10-13', freq=BDay())
index_len = len(index)
dfw = pd.DataFrame(data=np.arange(start=1, stop=1+(index_len-1)*0.002, step=0.002),
                   index=index,
                   columns=['col1'])


def weekly_ret(x):
    if x.size > 0:
        print(f"range is {x.index[0]} - {x.index[-1]}")
        return (x.iloc[-1] - x.iloc[0]) / x.iloc[0]
    else:
        return np.nan


dfw = dfw.resample(rule='5B').apply(weekly_ret)
print(dfw)

then I get the following output but this is not what I want:

range is 2020-09-14 00:00:00 - 2020-09-18 00:00:00
range is 2020-09-21 00:00:00 - 2020-09-25 00:00:00
range is 2020-09-28 00:00:00 - 2020-10-02 00:00:00
range is 2020-10-05 00:00:00 - 2020-10-09 00:00:00
range is 2020-10-12 00:00:00 - 2020-10-13 00:00:00
                col1
2020-09-14  0.008000
2020-09-21  0.007921
2020-09-28  0.007843
2020-10-05  0.007767
2020-10-12  0.001923

I would like it to start from 2020-10-13 backwards so that the last range would be:

range is 2020-10-07 00:00:00 - 2020-10-13 00:00:00

instead of:

range is 2020-10-12 00:00:00 - 2020-10-13 00:00:00

What I have tried so far:

Inverting the dataframe with dfw = dfw.reindex(index=dfw.index[::-1])
The step #1 above plus having the rule to be -5B, this results in an error.
Using the origin parameter for the resample function but this has no effect on the order of the computation i.e. origin=dfw.index[-1]
The step #1 above plus computing per number of rows on the inverted dataframe dfw = dfw.rolling(5).apply(weekly_ret)[::5] but here I get a NaN for the first (last) interval and this solution is somewhat also wasteful.

UPDATE: this would be the wanted output; notice the last return considers the week starting from the last day in the index backwards:

range is 2020-09-16 00:00:00 - 2020-09-22 00:00:00 = 0.007968127490039847
range is 2020-09-23 00:00:00 - 2020-09-29 00:00:00 = 0.00788954635108482
range is 2020-09-30 00:00:00 - 2020-10-06 00:00:00 = 0.007812500000000007
range is 2020-10-07 00:00:00 - 2020-10-13 00:00:00 = 0.00773694390715668
                col1
2020-09-22  0.007968
2020-09-29  0.007890
2020-10-06  0.007813
2020-10-13  0.007737 i.e. (1.042 - 1.034)/1.034

`dfw = dfw.rolling(5,1).apply(weekly_ret)[::5]` to remove NaN — noah, Nov 20 '20 at 19:00

score 1 · Accepted Answer · answered Nov 26 '20 at 07:37

So what you're looking for are anchored offsets, i.e. resampling the DataFrame on a weekly basis, starting on the same weekday that your last index is on. In your case, 2020-10-13 is a Tuesday, i.e. you want to use the rule W-TUE. I'd suggest using a lookup dictionary to translate the .weekday() number (e.g. Tuesday == 1) into the respective rule. Then you only need to apply your function over the .resample():

rule_lookup={
    0:'W-MON',
    1:'W-TUE',
    2:'W-WED',
    3:'W-THU',
    4:'W-FRI',
    5:'W-SAT',
    6:'W-SUN'
}

# get the proper rule which ends on the last date in the index
rule = rule_lookup[dfw.index[-1].weekday()] 
print(f"=> resampling using rule: {rule}")
dfw = dfw.resample(rule=rule).apply(weekly_ret)
print(dfw)

yields:

=> resampling using rule: W-TUE
range is 2020-09-14 00:00:00 - 2020-09-15 00:00:00
range is 2020-09-16 00:00:00 - 2020-09-22 00:00:00
range is 2020-09-23 00:00:00 - 2020-09-29 00:00:00
range is 2020-09-30 00:00:00 - 2020-10-06 00:00:00
range is 2020-10-07 00:00:00 - 2020-10-13 00:00:00
                col1
2020-09-15  0.002000
2020-09-22  0.007968
2020-09-29  0.007890
2020-10-06  0.007813
2020-10-13  0.007737

How to do time series backward resampling e.g. 5 business days starting on the last data date?

1 Answers1