I am trying to build a forecasting model using SARIMAX in Python (regression with SARIMA errors) and require some guidance on how exogenous variables are handled in exog argument.
The default parameters is:
SARIMAX(endog, exog=None, order=(1, 0, 0), seasonal_order=(0, 0, 0, 0), trend=None, measurement_error=False,
time_varying_regression=False, mle_regression=True, simple_differencing=False, enforce_stationarity=True,
enforce_invertibility=True, hamilton_representation=False, concentrate_scale=False, trend_offset=1,
use_exact_diffuse=False, dates=None, freq=None, missing='none', validate_specification=True, **kwargs)
This is how I fitted my model:
*Before I pass endog and exog to the SARIMAX function I did not transform the variables.
SARIMAX(endog, exog=exog['TMIN_IAC'], order= (0,1,1), seasonal_order= (0,0,0,0), trend='c')
And this is the resultant summary:
SARIMAX Results
==============================================================================
Dep. Variable: all No. Observations: 151
Model: SARIMAX(0, 1, 1) Log Likelihood -624.229
Date: Mon, 05 Apr 2021 AIC 1256.457
Time: 14:36:48 BIC 1268.500
Sample: 01-31-2001 HQIC 1261.350
- 07-31-2013
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
intercept 0.2139 0.071 2.996 0.003 0.074 0.354
TMIN_IAC -6.1222 0.474 -12.920 0.000 -7.051 -5.193
ma.L1 -0.9504 0.029 -33.060 0.000 -1.007 -0.894
sigma2 237.3801 33.036 7.185 0.000 172.631 302.130
===================================================================================
Ljung-Box (L1) (Q): 0.25 Jarque-Bera (JB): 2.21
Prob(Q): 0.62 Prob(JB): 0.33
Heteroskedasticity (H): 1.26 Skew: -0.08
Prob(H) (two-sided): 0.42 Kurtosis: 2.43
===================================================================================
I did a search in the documentation, but the closest thing of my question they cite about is this:
If simple_differencing = True is used, then the endog and exog data are differenced prior to putting the model in state-space form. This has the same effect as if the user differenced the data prior to constructing the model, which has implications for using the results
My concern is because according to Alan Pankratz, in his book Forecasting With Dynamic Regression Models (1991), if differencing is applied to the errors in a multiple regression both of the dependent and the explanatory variables should be differenced, and I am not certain Statsmodels do that automatically.