Puzzles with rolling windows for statsmodels RollingOLS

Question

I am quite confused with rolling windows for statsmodels RollingOLS which is described in the RollingOLS example page. It mentions that

Estimated values are aligned so that models estimated using data points t, (t+1), ..., (t + windows) are stored in location (t + windows).

I have some questions:

Q1: Assume we are at row t, if I set RollingOLS(endog, exog, window=60), it estimates the model using data from [t - 60, t] (i.e. (t - 60), (t - 59), ..., t) which has 61 observations, right? But this is an window of 61 days.

Q2: If we use model.params to extract the estimated coefficients, the coefficients at row t is the OLS results using data from [t - 60, t] (i.e. (t - 60), (t - 59), ..., t), am I right?

Q3: If my guess in Q2 is right, how do we solve the rolling problem mentioned here? That is

I want to run a rolling 100-day window OLS regression estimation, which is:

First for the 101st row, I run a regression of Y-X1,X2,X3 using the 1st to 100th rows, and estimate Y for the 101st row;

Then for the 102nd row, I run a regression of Y-X1,X2,X3 using the 2nd to 101st rows, and estimate Y for the 102nd row;

Then for the 103rd row, I run a regression of Y-X1,X2,X3 using the 2nd to 101st rows, and estimate Y for the 103rd row;

Until the last row.

score 1 · Accepted Answer · answered Jul 12 '21 at 14:03

There is a small typo in the example. When window is n, the first value computed uses observations 0,1,...,n-1 and appears in res.params[n-1]. This is how it should be so that the window size is actually enforces. You can see this here

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pandas_datareader as pdr
import seaborn

import statsmodels.api as sm
from statsmodels.regression.rolling import RollingOLS

factors = pdr.get_data_famafrench("F-F_Research_Data_Factors", start="1-1-1926")[0]
industries = pdr.get_data_famafrench("10_Industry_Portfolios", start="1-1-1926")[0]

endog = industries.HiTec - factors.RF.values
exog = sm.add_constant(factors["Mkt-RF"])
rols = RollingOLS(endog, exog, window=60)
rres = rols.fit()
params = rres.params.copy()
params.index = np.arange(1, params.shape[0] + 1)
params.iloc[57:62]

Note that the index here is 1, 2, ..., so that the first is in position 60, indicating that 60 observations where used to compute the first estimate.

       const    Mkt-RF
58       NaN       NaN
59       NaN       NaN
60  0.876155  1.399240
61  0.879936  1.406578
62  0.953169  1.408826

To get the estimate using data up to and including any point t, use res.params[t].

`shift`

If you want to observations to be aligned "out-of-sample" so that the parameter estimates using observations up-to-and-including observation t are aligned to t+1, you can use shift

params.shift(1).iloc[57:62]

You see the parameter values that were at 60 are now at 61.

Thanks, Kevin. I found the example page has been modified. Now my problem is solved. — Leoalan.Huang, Jul 13 '21 at 05:18

Puzzles with rolling windows for statsmodels RollingOLS

1 Answers1

shift

`shift`