0

After training a sarimax model, I had hoped to be able to preform forecasts in future using it with new observations without having to retrain it. However, I noticed that the number of observations i use in the newly applied forecast change the predictions.

From my understanding, provided that enough observations are given to allow the autoregression and moving average to be calculated correctly, the model would not even use the earlier historic observations to inform itself as the coefficients are not being retrained. In a (3,0,1) example i would have thought it would need atleast 3 observations to apply its trained coefficients. However this does not seem to be the case and i am questioning whether i have understood the model correctly.

as an example and test, i have applied a trained sarimax to the exact same data with the initial few observations removed to test the effect of the number of rows on the prediction with the following code:

import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX, SARIMAXResults
y = [348, 363, 435, 491, 505, 404, 359, 310, 337, 360, 342, 406, 396, 420, 472, 548, 559, 463, 407, 362, 405, 417, 391, 419, 461, 472, 535, 622, 606, 508, 461, 390, 432]
ynew = y[10:]
print(ynew)
model = SARIMAX(endog=y, order=(3,0,1))
model = model.fit()
print(model.params)
pred1 = model.predict(start=len(y), end = len(y)+7)
model2 = model.apply(ynew)
print(model.params)
pred2 = model2.predict(start=len(ynew), end = len(ynew)+7)
print(pd.DataFrame({'pred1': pred1, 'pred2':pred2}))

The results are as follows:

   pred1       pred2
0  472.246996  472.711770
1  494.753955  495.745968
2  498.092585  499.427285
3  489.428531  490.862153
4  477.678527  479.035869
5  469.023243  470.239459
6  465.576002  466.673790
7  466.338141  467.378903

Based on this, it means that if I were to produce a forecast from a trained model with new observations, the change in the number of observations itself would impact the integrity of the forecast.

What is the explanation for this? What is the standard practice for applying a trained model on new observations given the change in the number of them?

If i wanted to update the model but could not control for whether or not i had all of the original observations from the very start of my training set, this test would indicate that my forecast might as well be random numbers.

Saleem Khan
  • 700
  • 2
  • 6
  • 20

1 Answers1

1

Main issue

The main problem here is that you are not using your new results object (model2) for your second set of predictions. You have:

pred2 = model.predict(start=len(ynew), end = len(ynew)+7)

but you should have:

pred2 = model2.predict(start=len(ynew), end = len(ynew)+7)

If you fix this, you get very similar predictions:

      pred1       pred2
0  472.246996  472.711770
1  494.753955  495.745968
2  498.092585  499.427285
3  489.428531  490.862153
4  477.678527  479.035869
5  469.023243  470.239459
6  465.576002  466.673790
7  466.338141  467.378903

To understand why they're not identical, there is a second issue (which is not a problem in your code, but just a statistical feature of your data/model).

Secondary issue

Your estimated parameters imply an extremely persistent model:

print(params)

gives

ar.L1        2.134401
ar.L2       -1.683946
ar.L3        0.549369
ma.L1       -0.874801
sigma2    1807.187815

with is associated with a near-unit-root process (largest eigenvalue = 0.99957719).

What this means is that it takes a very long time for the effects of a particular datapoint on the forecast to die out. In your case, this just means that there are still small effects on the forecasts from the first 10 periods.

This isn't a problem, it's just the way this particular estimated model works.

cfulton
  • 2,855
  • 2
  • 14
  • 13
  • Thanks for spotting the coding error. i have removed it to isolate the issue that i wanted to demonstrate in the example. I am struggling to understand the statistical feature you have pointed out. Could you point me to something that explains the near-unit-root process and how observations early on can influence a forecast with a small order? My (obviously incorrect) understanding has always been that the model would only look back as far as the pdq points it to – Saleem Khan Feb 24 '22 at 12:16
  • 1
    When you have MA terms in an ARMA, the optimal prediction depends on the pure AR representation of the ARMA. This is because the errors are not observed, and so you cannot only use the final values of Y and the final error to make the prediction. This requires inverting the MA into a T-lag AR model so that the forecast at time T+1 can be computed. When you invert this MA into a long-lag AR you end up with very large coefficients on distant lags. Hence why whether you include distant Y or not has an impact on the forecast. If you dropped the MA, you would not have this issue. – Kevin S Feb 24 '22 at 16:44
  • @KevinS thanks for the explanation. Do you know of any resources that discuss this or best practices in relation to this? – Saleem Khan Feb 24 '22 at 21:56
  • 1
    Any decent time series textbook will discuss inverting MA processes. Hamilton or Brockwell and Davis come to mind. The forecasts you get from these are "optimal" in the sense that they minimize the MSE given the data. Theoretically, longer time series will get you closer to the optimal forecast and if your time series is long enough, adding values early in the sample will have a near-0 effect on forecasts even when you have an MA term. – Kevin S Feb 25 '22 at 12:54
  • 1
    You can play around with this using [`ArmaProcess.arma2ar`](https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima_process.ArmaProcess.arma2ar.html#statsmodels.tsa.arima_process.ArmaProcess.arma2ar) – Kevin S Feb 25 '22 at 12:55