Do we need to do differencing of exogenous variables before passing to exog argument of SARIMAX() from statsmodels in Python?

Question

I am trying to build a forecasting model using SARIMAX in Python (regression with SARIMA errors) and require some guidance on how exogenous variables are handled in exog argument.

The default parameters is:

SARIMAX(endog, exog=None, order=(1, 0, 0), seasonal_order=(0, 0, 0, 0), trend=None, measurement_error=False, 
time_varying_regression=False, mle_regression=True, simple_differencing=False, enforce_stationarity=True,
enforce_invertibility=True, hamilton_representation=False, concentrate_scale=False, trend_offset=1,
use_exact_diffuse=False, dates=None, freq=None, missing='none', validate_specification=True, **kwargs)

This is how I fitted my model:

*Before I pass endog and exog to the SARIMAX function I did not transform the variables.

SARIMAX(endog, exog=exog['TMIN_IAC'], order= (0,1,1), seasonal_order= (0,0,0,0), trend='c')

And this is the resultant summary:

                                SARIMAX Results                                
==============================================================================
Dep. Variable:                    all   No. Observations:                  151
Model:               SARIMAX(0, 1, 1)   Log Likelihood                -624.229
Date:                Mon, 05 Apr 2021   AIC                           1256.457
Time:                        14:36:48   BIC                           1268.500
Sample:                    01-31-2001   HQIC                          1261.350
                         - 07-31-2013                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
intercept      0.2139      0.071      2.996      0.003       0.074       0.354
TMIN_IAC      -6.1222      0.474    -12.920      0.000      -7.051      -5.193
ma.L1         -0.9504      0.029    -33.060      0.000      -1.007      -0.894
sigma2       237.3801     33.036      7.185      0.000     172.631     302.130
===================================================================================
Ljung-Box (L1) (Q):                   0.25   Jarque-Bera (JB):                 2.21
Prob(Q):                              0.62   Prob(JB):                         0.33
Heteroskedasticity (H):               1.26   Skew:                            -0.08
Prob(H) (two-sided):                  0.42   Kurtosis:                         2.43
===================================================================================

I did a search in the documentation, but the closest thing of my question they cite about is this:

If simple_differencing = True is used, then the endog and exog data are differenced prior to putting the model in state-space form. This has the same effect as if the user differenced the data prior to constructing the model, which has implications for using the results

My concern is because according to Alan Pankratz, in his book Forecasting With Dynamic Regression Models (1991), if differencing is applied to the errors in a multiple regression both of the dependent and the explanatory variables should be differenced, and I am not certain Statsmodels do that automatically.

score 0 · Answer 1 · answered Apr 06 '21 at 13:51

It seems SARIMAX from statsmodels also difference both, the response and the exog variables, automatically.

According to Rob Hyndman, author of Arima function in the forecast package in R:

Arima will difference both the response variable and the xreg variables as specified in the order and seasonal arguments. You should never need to do the differencing yourself.

So I ran the same model in R and acquired the same results:

Arima(endog, order = c(0,1,1),seasonal = c(0,0,0), xreg = exog, include.drift = TRUE,
lambda = NULL, method = 'ML')

Model summary:

Regression with ARIMA(0,1,1) errors 

Coefficients:
          ma1   drift  TMIN_IAC
      -0.9504  0.2139   -6.1219
s.e.   0.0381  0.0724    0.4763

sigma^2 estimated as 242.2:  log likelihood=-624.23
AIC=1256.47   AICc=1256.74   BIC=1268.51

Do we need to do differencing of exogenous variables before passing to exog argument of SARIMAX() from statsmodels in Python?

1 Answers1