0

I have calculated monthly rolling average for my data but there is a one-month lag (delay) in calculation such that if we are in October, the results need to deliver the September monthly average and so on for the rest of the year for each month. What argument can I add to my rolling average to take into account that lag?

history_data_resampled.rolling(window = 12).mean()

Vitalizzare
  • 4,496
  • 7
  • 13
  • 32
Nada
  • 9
  • 3
  • Maybe, you can use [DataFrame.shift](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.shift.html) to move data one record forward. Provide the sample data and desired output, please, if that's not the case. – Vitalizzare Oct 14 '22 at 16:47

1 Answers1

0

There are multiple ways you can achieve this. I'll assume history_data_resampled has a day frequency, and that you're using pandas:

import pandas as pd
import numpy as np

# == Generating Random DataFrame ===============================
dates = pd.period_range('2021-01-01', '2022-12-31', freq='d')
random_values = np.random.randint(0, 120, dates.shape[0])
df = pd.DataFrame({'date': dates, 'value': random_values})

# == Moving average =============================================
# Generating a monthly average, with a period of 12 months, and 1 month of lag:
df.set_index('date').resample("1M")['value'].sum().rolling(12).mean()
# Outputs:
"""
date
2021-01            NaN
2021-02            NaN
2021-03            NaN
2021-04            NaN
2021-05            NaN
2021-06            NaN
2021-07            NaN
2021-08            NaN
2021-09            NaN
2021-10            NaN
2021-11            NaN
2021-12    1644.750000
2022-01    1648.416667
2022-02    1663.583333
2022-03    1696.166667
2022-04    1747.666667
2022-05    1766.750000
2022-06    1783.166667
2022-07    1825.083333
2022-08    1814.166667
2022-09    1792.250000
2022-10    1809.833333
2022-11    1789.166667
2022-12    1798.166667
Freq: M, Name: value, dtype: float64
"""

In the above example, we've resampled the data to a monthly frequency, summing all values from the same month, and then for each month, calculated the average of the previous 12 months. We're using .resample("1M")['value'].sum() to sum all observations that fall in a specific month together.

Note that before '2021-11' we couldn't calculate any value, since our data would require us to have values from '2020' in our dataset.

Ingwersen_erik
  • 1,701
  • 1
  • 2
  • 9