Overview
I am trying to implement autoregressive moving average (ARMA) parameter optimization using maximum likelihood estimation (MLE) via the Kalman Filter. I know that I can fit ARMA models using the statsmodels package in Python, but I want to write my own implementation of the ARMA likelihood and subsequent optimization as a prototype for a future C/C++ implementation. Also, when I look through the statsmodels documentation, I find that the statsmodels Kalman Filter Log Likelihood implements a slightly different expression than I have found in the literature.
Algorithms
In order to calculate the ARMA log likelihood, I am following the 1980 paper by Pearlman:
Pearlman, J. G. "An algorithm for the exact likelihood of a high-order autoregressive-moving average process." Biometrika 67.1 (1980): 232-233.). Available from JSTOR.
In order to calculate the initial P matrix, I am following an algorithm in
Gardner, G., Andrew C. Harvey, and Garry DA Phillips. "Algorithm AS 154: An algorithm for exact maximum likelihood estimation of autoregressive-moving average models by means of Kalman filtering." Journal of the Royal Statistical Society. Series C (Applied Statistics) 29.3 (1980): 311-322. Available from JSTOR.
For the initial parameter values, I am currently using the internal method that statsmodels ARMA models use to compute the initial guess for ARMA parameters. In the future I plan to move to my own implementation, but I am using _fit_starts_params while I debug my MLE.
For optimizing the MLE, I am simply using the L-BFGS solver in Scipy.
Code
import numpy as np
import statsmodels.api as sm
import statsmodels.tsa.arima_model
import scipy.optimize
class ARMA(object):
def __init__(self, endo, nar, nma):
np.random.seed(0)
# endogenous variables
self.endo = endo
# Number of AR terms
self.nar = nar
# Number of MA terms
self.nma = nma
# "Dimension" of the ARMA fit
self.dim = max(nar, nma+1)
# Current ARMA parameters
self.params = np.zeros(self.nar+self.nma, dtype='float')
def __g(self, ma_params):
'''
Build MA parameter vector
'''
g = np.zeros(self.dim, dtype='float')
g[0] = 1.0
if self.nma > 0:
g[1:self.nma+1] = ma_params
return g
def __F(self, ar_params):
'''
Build AR parameter matrix
'''
F = np.zeros((self.dim, self.dim), dtype='float')
F[:self.nar, 0] = ar_params
for i in xrange(1, self.dim):
F[i-1, i] = 1.0
return F
def __initial_P(self, R, T):
'''
Solve for initial P matrix
Solves P = TPT' + RR'
'''
v = np.zeros(self.dim*self.dim, dtype='float')
for i in xrange(self.dim):
for j in xrange(self.dim):
v[i+j*self.dim] = R[i]*R[j]
R = np.array([R])
S = np.identity(self.dim**2, dtype='float')-np.kron(T, T)
V = np.outer(R, R).ravel('F')
Pmat = np.linalg.solve(S,V).reshape(self.dim, self.dim, order='F')
return Pmat
def __likelihood(self, params):
'''
Compute log likehood for a parameter vector
Implements the Pearlman 1980 algorithm
'''
# these checks are pilfered from statsmodels
if self.nar > 0 and not np.all(np.abs(np.roots(np.r_[1, -params[:self.nar]]))<1):
print 'AR coefficients are not stationary'
if self.nma > 0 and not np.all(np.abs(np.roots(np.r_[1, -params[-self.nma:]]))<1):
print 'MA coefficients are not stationary'
ar_params = params[:self.nar]
ma_params = params[-self.nma:]
g = self.__g(ma_params)
F = self.__F(ar_params)
w = self.endo
P = self.__initial_P(g, F)
n = len(w)
z = np.zeros(self.dim, dtype='float')
R = np.zeros(n, dtype='float')
a = np.zeros(n, dtype='float')
K = np.dot(F, P[:, 0])
L = K.copy()
R[0] = P[0, 0]
for i in xrange(1, n):
a[i-1] = w[i-1] - z[0]
z = np.dot(F, z) + K*(a[i-1]/R[i-1])
Kupdate = -(L[0]/R[i-1])*np.dot(F, L)
Rupdate = -L[0]*L[0]/R[i-1]
P -= np.outer(L, L)/R[i-1]
L = np.dot(F, L) - (L[0]/R[i-1])*K
K += Kupdate
R[i] = R[i-1] + Rupdate
if np.abs(R[i] - 1.0) < 1e-9:
R[i:] = 1.0
break
for j in xrange(i, n):
a[j] = w[j] - z[0]
z = np.dot(F, z) + K*(a[i-1]/R[i-1])
likelihood = 0.0
for i in xrange(n):
likelihood += np.log(R[i])
likelihood *= -0.5
ssum = 0.0
for i in xrange(n):
ssum += a[i]*a[i]/R[i]
likelihood += -0.5*n*np.log(ssum)
return likelihood
def fit(self):
'''
Fit the ARMA model by minimizing the loglikehood
Uses scipy.optimize.minimize
'''
sm_arma = statsmodels.tsa.arima_model.ARMA(endog=self.endo, order=(self.nar, self.nma, 0))
params = statsmodels.tsa.arima_model.ARMA._fit_start_params_hr(sm_arma, order=(self.nar, self.nma, 0))
opt = scipy.optimize.minimize(fun=self.__likelihood, x0=params, method='L-BFGS-B')
print opt
# Test the code on statsmodels sunspots data
nar = 2
nma = 1
endo = sm.datasets.sunspots.load_pandas().data['SUNACTIVITY'].tolist()
arma = ARMA(endo=endo, nar=nar, nma=nma)
arma.fit()
Issues
I find that the above example does not converge. In the third call of ARMA._likelihood, the code throws the following warning:
RuntimeWarning: invalid value encountered in log
likelihood += np.log(R[i])
which happens because ARMA._initial_P solves for a matrix where P[0][0] < 0.0. At this point, the current estimates of the AR parameters become non-stationary. All subsequent iterations then warn that the AR and MA parameters are non-stationary.
Questions
Is this implementation correct? I have checked that the initial P matrix satisfies the equation it is supposed to satisfy. For the likelihood calculation, I see several behaviors that I expect from the Pearlman paper:
- R tends to one. For a pure AR process with p AR parameters, it achieves this limit in p steps. Basically, the break statement in the _likelihood function comes into effect after p iterations of the Pearlman algorithm steps.
- L tends to the zero vector.
- K tends to F.g. I check this by looking at abs(K - F.g) while calculating the likelihood.
- After the warning about the negative value in the logarithm, the above limits are no longer obeyed.
I have also tried implementing a transformation of the ARMA parameters to prevent overflow/underflow, as recommended in
Jones, Richard H. "Maximum likelihood fitting of ARMA models to time series with missing observations." Technometrics 22.3 (1980): 389-395. Available from JSTOR.
This transformation seemed to no effect on the errors I observed.
If the implementation is correct, then how do I handle the negative R values? The issue seems to arise when scipy.optimize returns a parameter vector that corresponds to a P matrix for which the top diagonal element is negative. Is the optimization routine supposed to be bounded to prevent negative R values? I have also tried using complex logarithms for negative values as well as changing all numpy dtype parameters to 'complex'. For example:
def complex_log(val): ''' Complex logarithm for negative values Returns log(val) + I*pi ''' if val < 0.0: return complex(np.log(np.abs(val)), np.pi) return np.log(val)
However, scipy.optimize cannot handle complex-valued functions, so this supposed fix has not worked so far. Any recommendations for preventing or handling these behaviors?
Thanks for reading this far. Any help is much appreciated.