How to forecast time series using AutoReg in python

Question

I'm trying to build old school model using only auto regression algorithm. I found out that there's an implementation of it in statsmodel package. I've read the documentation, and as I understand it should work as ARIMA. So, here's my code:

import statsmodels.api as sm
model = sm.tsa.AutoReg(df_train.beer, 12).fit()

And when I want to predict new values, I'm trying to follow the documentation:

y_pred = model.predict(start=df_test.index.min(), end=df_test.index.max())
# or
y_pred = model.predict(start=100, end=1000)

Both returns a list of NaNs.

Also, when I type model.predict(0, df_train.size - 1) it predicts real values, but model.predict(0, df_train.size) predicts NaNs list.

Am I doing something wrong?

P.S. I know there's ARIMA, ARMA or SARIMAX algorithms, that can be used as basic auto regression. But I need exactly AutoReg.

Sandipan Dey · Answer 1 · 2021-11-08T06:07:15.707

We can do the forecasting in couple of ways:

by directly using the predict() function and
by using the definition of AR(p) process and the parameters learnt with AutoReg(): this will be helpful for short-term predictions, as we shall see.

Let's start with a sample dataset from statsmodels, the data looks like the following:

import statsmodels.api as sm
data = sm.datasets.sunspots.load_pandas().data['SUNACTIVITY']
plt.plot(range(len(data)), data)

Let's fit an AR(p) process to model the time series and use partial autocorrelation plot to find the order p, as shown below

As seen from above, the first few PACF values remain significant, let's use p=10 for the AR(p).

Let's divide the data into training and validation (test) datasets and fit auto-regressive model of order 10 using the training data:

from statsmodels.tsa.ar_model import AutoReg
n = len(data)
ntrain = int(n*0.9)
ntest = n - ntrain
lag = 10
res = AutoReg(data[:ntrain], lags = lag).fit()

Now, use the predict() function for forecasting all values corresponding to the held-out dataset:

preds = res.model.predict(res.params, start=n-ntest, end=n)

Notice that we can get the exactly same predictions using the parameters from the trained model, as shown below:

x = data[ntrain-lag:ntrain].values
preds1 = []
for t in range(ntrain, n):
    pred = res.params[0] + np.sum(res.params[1:]*x[::-1])
    x[:lag-1], x[lag-1] = x[-(lag-1):], pred
    preds1.append(pred)

Note that the forecast values generated this way is same as the ones obtained using the predict() function above.

np.allclose(preds.values, np.array(preds1))
# True

Now, let's plot the forecast values for the test data:

As can be seen, for long term prediction, quality of forecasting is not that good (since the forecasted values are used for long term prediction).

Let's instead go for short-term predictions now and use the last lag points from the dataset to forecast the next value, as shown in the next code snippet.

preds = []
for t in range(ntrain, n):
    pred = res.params[0] + np.sum(res.params[1:]*data[t-lag:t].values[::-1])
    preds.append(pred)

As can be seen from the next plot, short term forecasting works way better:

Maybe it works better because you are leaking the test set to predictions in here: np.sum(res.params[1:]*data[t-lag:t].values[::-1])? — ooxio, Aug 14 '22 at 17:15

score 1 · Answer 2 · edited Aug 27 '22 at 04:41

1

You can use this code for forecasting

import statsmodels as sm

model = sm.tsa.AutoReg(df_train.beer, 12).fit()
y_pred = model.model.predict(model.params, start=df_test.index.min(), end=df_test.index.max())

edited Aug 27 '22 at 04:41

Bill DeRose

2,330
3
25
36

answered Aug 16 '20 at 07:22

Ivan Adanenko

445
2
6
18

This makes no sense. What if data doesn't have an index? Also this is identical to the original question. – Ash Apr 27 '23 at 15:43

score -2 · Answer 3 · answered Aug 25 '22 at 16:24

-2

from statsmodels.tsa.ar_model import AutoReg

model=AutoReg(dataset[''],lags=1)
ARFit=model.fit()
forecasted=ARFit.predict(start=len(dataset),end=len(dataset)+12)

#visualizacion
dataset[''].plot(figsize=(12,8),legend=True)
forecasted.plot(legend=True)

answered Aug 25 '22 at 16:24

judith angélica

1

2

This answer was reviewed in the [Low Quality Queue](https://stackoverflow.com/help/review-low-quality). Here are some guidelines for [How do I write a good answer?](https://stackoverflow.com/help/how-to-answer). Code only answers are **not considered good answers**, and are likely to be downvoted and/or deleted because they are **less useful** to a community of learners. It's only obvious to you. Explain what it does, and how it's different / **better** than existing answers. [From Review](https://stackoverflow.com/review/low-quality-posts/32572291) – Trenton McKinney Aug 25 '22 at 17:32

How to forecast time series using AutoReg in python

3 Answers3

Linked