0

So I would like to ask if anyone has experiences with sarimax function in python

When using model with differences or seasonal differences, residuals seem to be bit wierd.

model2 = SARIMAX(np.log(train["S4236SM144NCEN"]), order=(4, 0, 0), seasonal_order=(1, 1, 0, 12) );
model_results2 = model2.fit(maxiter = 500);
print(model_results2.summary());

then ploting standardized residuals:

model_results2.plot_diagnostics(figsize = (12, 7), lags=20);

These residual (above code) are what they should be I suppose but when doing

model_results2.resid.plot()

and these residuals using just function .resid.plot() have very weird trajectory.

So What I did is I took residuals from 13th observation when doing seasonal differences, using code:

resid2 = pd.DataFrame(model_results2.resid.iloc[12:])
resid2.plot()

When plotting residuals from this code they look exactly the same as with using .plot_diagnostic() - So I suppose this is the right thing.

My question is what residuals object I should Use for testing for Autocorrelation, Arch effects, Normality..., Because using 2 different residuals objects from the same model will results in complete different resid. diagnostics.

name = ['Statistic', 'p-value']
lzip(name, acorr_ljungbox(model_results2.resid, lags = 12))

name = ['Lagrange multiplier statistic', 'p-value',
       'F-TEST', 'F-p-value']
diag = het_arch(model_results2.resid, maxlag=12)
lzip(name, diag)

have completely different results than taking from 13th observation:

name = ['Statistic', 'p-value']
lzip(name, acorr_ljungbox(model_results2.resid.iloc[13:], lags = 12))

name = ['Lagrange multiplier statistic', 'p-value',
       'F-TEST', 'F-p-value']
diag = het_arch(model_results2.resid.iloc[13:], maxlag=12)
lzip(name, diag)

Code for downloading data:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='ticks', context='poster')
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.seasonal import seasonal_decompose
import seaborn as sns
#plt.style.use("ggplot")
import pandas_datareader.data as web
import pandas as pd
import statsmodels.api as sm
import scipy
import statsmodels.stats.api as sms
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import datetime

start = datetime.datetime(1992, 1, 1)
end = datetime.datetime(2019, 10, 1)
df = web.DataReader(["TOTALSA", "S4236SM144NCEN"], "fred", start, end)
df.columns = ['TOTALSA', "S4236SM144NCEN"]
df.head()

train = df3.iloc[:len(df) - 12*2]
test = df3.iloc[len(df) - 12*2:]
df_train_test = pd.concat([test["S4236SM144NCEN"], train["S4236SM144NCEN"]], axis = 1)
df_train_test.columns = ["TEST", "TRAIN"]

Has anyone ever encountered this issue?

Petr
  • 1,606
  • 2
  • 14
  • 39

1 Answers1

0

You have to use the studentized residuals. model_results2.resid is not studentized. Check out this example: https://stackoverflow.com/a/63938424/14294235

ClemensHa
  • 11
  • 1