I want to test for stationarity on a time series (nobs = 23) and implemented the adfuller test from statsmodels.tsa.stattools.
Here are the original data:
1995-01-01 3126.0
1996-01-01 3321.0
1997-01-01 3514.0
1998-01-01 3690.0
1999-01-01 3906.0
2000-01-01 4065.0
2001-01-01 4287.0
2002-01-01 4409.0
2003-01-01 4641.0
2004-01-01 4812.0
2005-01-01 4901.0
2006-01-01 5028.0
2007-01-01 5035.0
2008-01-01 5083.0
2009-01-01 5183.0
2010-01-01 5377.0
2011-01-01 5428.0
2012-01-01 5601.0
2013-01-01 5705.0
2014-01-01 5895.0
2015-01-01 6234.0
2016-01-01 6542.0
2017-01-01 6839.0
Here’s is the custom ADF function I’m using (credit goes to this blog):
def test_stationarity(timeseries):
print('Results of Dickey-Fuller Test:')
dftest = adfuller(timeseries, autolag='AIC', maxlag = None)
dfoutput = pd.Series(dftest[0:4], index=['ADF Statistic', 'p-value', '#Lags Used', 'Number of Obs Used'])
for key, value in dftest[4].items():
dfoutput['Critical Value (%s)' % key] = value
print(dfoutput)
Here are the results of the ADF test on the original data:
ADF Statistic -0.126550
p-value 0.946729
#Lags Used 8.000000
Number of Obs Used 14.000000
Critical Value (1%) -4.012034
Critical Value (5%) -3.104184
Critical Value (10%) -2.690987
The ADF statistic is larger than all of the critical values and the p-value > alpha 0.05 indicating the series is not stationary so I perform a first differencing of the data. Here’s the differencing function and the results of the ADF test:
def difference(dataset):
diff = list()
for i in range(1, len(dataset)):
value = dataset[i] - dataset[i - 1]
#print(value)
diff.append(value)
return pd.Series(diff)
ADF Statistic -1.169799
p-value 0.686451
#Lags Used 9.000000
Number of Obs Used 12.000000
Critical Value (1%) -4.137829
Critical Value (5%) -3.154972
Critical Value (10%) -2.714477
The ADF statistic and p-value both improve but the series still isn’t stationary so I perform a second differencing, again here are the results:
ADF Statistic -0.000000
p-value 0.958532
#Lags Used 9.000000
Number of Obs Used 11.000000
Critical Value (1%) -4.223238
Critical Value (5%) -3.189369
Critical Value (10%) -2.729839
After a second differencing of the data, ADF test statistic becomes -0.0000 (which is puzzling given that a print() of the unrounded value returns -0.0 but either way implies that there’s some significant digit other than zero somewhere) and the p-value is now worse than it was in the beginning. I also receive this warning:
RuntimeWarning: divide by zero encountered in double_scalars
return np.dot(wresid, wresid) / self.df_resid.
A grid search of the p, d, q values returns an ARIMA(1, 1, 0) model but I assumed that a second differencing would still be necessary since first differencing did not achieve it.
I suspect the strange test statistic and p-value are due to the small sample size and high # of lags used by the ADF test’s default setting (maxlag = None). I understand that when maxlag is set to None it uses the formula int(np.ceil(12. * np.power(nobs/100., 1/4.))).
Is this appropriate? If not, is there any workaround for data sets with small numbers of observations or a rule of thumb for manually setting the maxlag value in the ADF function to avoid what appears to be an erroneous test statistic. I searched here, here, and here but couldn’t find a solution.
I’m using statsmodels version 0.8.0.