-2

I am trying to do a rolling average of 12 months minus 1 month, where the index is a datetime object of business days.

I can probably iterate through the dataframe with enumerate, but this is not very efficient, and I would like to find a way to use rolling with a mask, apply a custom function or use where, but I wasn't able to make it work.

1 Answers1

0

If I got you right, you can try resample your data first and then use rolling function. To show how it works I have made an example:

First, I have generated some data:

df = pd.DataFrame(data = [[i] for i in range(0,100)], columns=['data'])
df['date'] = pd.date_range(start='11.02.2020', freq='w', periods= 100)
df = df.set_index('date')

Here are some rows from this data:

    data
date    
2020-11-08  0
2020-11-15  1
2020-11-22  2
2020-11-29  3
2020-12-06  4
... ...
2022-09-04  95
2022-09-11  96
2022-09-18  97
2022-09-25  98
2022-10-02  99

Then we apply the resampling function.

resampled_df = df.resample('m').sum()

So now, my dataframe looks like:

    data
date    
2021-09-30  98.272727
2021-10-31  120.000000
2021-11-30  137.454545
2021-12-31  153.818182
2022-01-31  176.727273
2022-02-28  194.181818
2022-03-31  211.636364
2022-04-30  226.454545
2022-05-31  250.909091
2022-06-30  268.363636
2022-07-31  290.181818
2022-08-31  307.636364
2022-09-30  320.454545
2022-10-31  310.000000

And then we can apply rolling function with mean:

result = resampled_df.rolling(window=11).mean()

window = 11 means that we use 11 months window. But after that we will have some NaN values, because there isn't enough data for first 10 values to count mean for past 11 months. So you could drop them, if you like.

result = result.dropna()

Here is the result:

    data
date    
2021-09-30  98.272727
2021-10-31  120.000000
2021-11-30  137.454545
2021-12-31  153.818182
2022-01-31  176.727273
2022-02-28  194.181818
2022-03-31  211.636364
2022-04-30  226.454545
2022-05-31  250.909091
2022-06-30  268.363636
2022-07-31  290.181818
2022-08-31  307.636364
2022-09-30  320.454545
2022-10-31  310.000000

Each row now means the average of past 11 months.

  • Thanks a lot! I mean to do a rolling average of the eleven months prior to the last month. I should have provided an example with dummy data. I want to average my data for the period running from date of the row minus 360 days to date of the row minus 30 days. To your example with dummy data, for 2022-10-31, we'd be averaging data from 2021-11-05 to 2022-10-01. In the example you've provided you're simply averaging across the 11 months, but not excluding the last month (or 30 days). – Fidel Castro Jul 13 '22 at 09:58
  • Also - what if I use a lambda (.rolling(360).apply(lambda something). – Fidel Castro Jul 13 '22 at 10:19
  • @FidelCastro Now I got it. Is your data daily without any misses? – Anna Iliukovich-Strakovskaia Jul 13 '22 at 14:42
  • bussiness days, bank holidays missing – Fidel Castro Jul 15 '22 at 10:25