3

Overview

I am trying to implement autoregressive moving average (ARMA) parameter optimization using maximum likelihood estimation (MLE) via the Kalman Filter. I know that I can fit ARMA models using the statsmodels package in Python, but I want to write my own implementation of the ARMA likelihood and subsequent optimization as a prototype for a future C/C++ implementation. Also, when I look through the statsmodels documentation, I find that the statsmodels Kalman Filter Log Likelihood implements a slightly different expression than I have found in the literature.

Algorithms

In order to calculate the ARMA log likelihood, I am following the 1980 paper by Pearlman:

Pearlman, J. G. "An algorithm for the exact likelihood of a high-order autoregressive-moving average process." Biometrika 67.1 (1980): 232-233.). Available from JSTOR.

In order to calculate the initial P matrix, I am following an algorithm in

Gardner, G., Andrew C. Harvey, and Garry DA Phillips. "Algorithm AS 154: An algorithm for exact maximum likelihood estimation of autoregressive-moving average models by means of Kalman filtering." Journal of the Royal Statistical Society. Series C (Applied Statistics) 29.3 (1980): 311-322. Available from JSTOR.

For the initial parameter values, I am currently using the internal method that statsmodels ARMA models use to compute the initial guess for ARMA parameters. In the future I plan to move to my own implementation, but I am using _fit_starts_params while I debug my MLE.

For optimizing the MLE, I am simply using the L-BFGS solver in Scipy.

Code

import numpy as np
import statsmodels.api as sm
import statsmodels.tsa.arima_model
import scipy.optimize

class ARMA(object):
    def __init__(self, endo, nar, nma):
        np.random.seed(0)
        # endogenous variables
        self.endo = endo
        # Number of AR terms
        self.nar = nar
        # Number of MA terms
        self.nma = nma
        # "Dimension" of the ARMA fit
        self.dim = max(nar, nma+1)
        # Current ARMA parameters
        self.params = np.zeros(self.nar+self.nma, dtype='float')

    def __g(self, ma_params):
        '''
        Build MA parameter vector
        '''
        g = np.zeros(self.dim, dtype='float')
        g[0] = 1.0
        if self.nma > 0:
            g[1:self.nma+1] = ma_params
        return g

    def __F(self, ar_params):
        '''
        Build AR parameter matrix
        '''
        F = np.zeros((self.dim, self.dim), dtype='float')
        F[:self.nar, 0] =  ar_params
        for i in xrange(1, self.dim):
            F[i-1, i] = 1.0
        return F

    def __initial_P(self, R, T):
        '''
        Solve for initial P matrix

        Solves P = TPT' + RR'
        '''
        v = np.zeros(self.dim*self.dim, dtype='float')
        for i in xrange(self.dim):
            for j in xrange(self.dim):
                v[i+j*self.dim] = R[i]*R[j]
        R = np.array([R])
        S = np.identity(self.dim**2, dtype='float')-np.kron(T, T)
        V = np.outer(R, R).ravel('F')
        Pmat = np.linalg.solve(S,V).reshape(self.dim, self.dim, order='F')
        return Pmat

    def __likelihood(self, params):
        '''
        Compute log likehood for a parameter vector

        Implements the Pearlman 1980 algorithm
        '''
        # these checks are pilfered from statsmodels
        if self.nar > 0 and not np.all(np.abs(np.roots(np.r_[1, -params[:self.nar]]))<1):
            print 'AR coefficients are not stationary'
        if self.nma > 0 and not np.all(np.abs(np.roots(np.r_[1, -params[-self.nma:]]))<1):
           print 'MA coefficients are not stationary'
        ar_params = params[:self.nar]
        ma_params = params[-self.nma:]
        g = self.__g(ma_params)
        F = self.__F(ar_params)
        w = self.endo
        P = self.__initial_P(g, F)
        n = len(w)
        z = np.zeros(self.dim, dtype='float')
        R = np.zeros(n, dtype='float')
        a = np.zeros(n, dtype='float')
        K = np.dot(F, P[:, 0])
        L = K.copy()
        R[0] = P[0, 0]
        for i in xrange(1, n):
            a[i-1] = w[i-1] - z[0]
            z = np.dot(F, z) + K*(a[i-1]/R[i-1])
            Kupdate = -(L[0]/R[i-1])*np.dot(F, L)
            Rupdate = -L[0]*L[0]/R[i-1]
            P -= np.outer(L, L)/R[i-1]
            L = np.dot(F, L) - (L[0]/R[i-1])*K
            K += Kupdate
            R[i] = R[i-1] + Rupdate
            if np.abs(R[i] - 1.0) < 1e-9:
                R[i:] = 1.0
                break
        for j in xrange(i, n):
            a[j] = w[j] - z[0]
            z = np.dot(F, z) + K*(a[i-1]/R[i-1])
        likelihood = 0.0
        for i in xrange(n):
            likelihood += np.log(R[i])
        likelihood *= -0.5
        ssum = 0.0
        for i in xrange(n):
            ssum += a[i]*a[i]/R[i]
        likelihood += -0.5*n*np.log(ssum)
        return likelihood

    def fit(self):
        '''
        Fit the ARMA model by minimizing the loglikehood

        Uses scipy.optimize.minimize
        '''
        sm_arma = statsmodels.tsa.arima_model.ARMA(endog=self.endo, order=(self.nar, self.nma, 0))
        params = statsmodels.tsa.arima_model.ARMA._fit_start_params_hr(sm_arma, order=(self.nar, self.nma, 0))
        opt = scipy.optimize.minimize(fun=self.__likelihood, x0=params, method='L-BFGS-B')
        print opt

# Test the code on statsmodels sunspots data
nar = 2
nma = 1

endo = sm.datasets.sunspots.load_pandas().data['SUNACTIVITY'].tolist()
arma = ARMA(endo=endo, nar=nar, nma=nma)
arma.fit()

Issues

I find that the above example does not converge. In the third call of ARMA._likelihood, the code throws the following warning:

RuntimeWarning: invalid value encountered in log
  likelihood += np.log(R[i])

which happens because ARMA._initial_P solves for a matrix where P[0][0] < 0.0. At this point, the current estimates of the AR parameters become non-stationary. All subsequent iterations then warn that the AR and MA parameters are non-stationary.

Questions

  1. Is this implementation correct? I have checked that the initial P matrix satisfies the equation it is supposed to satisfy. For the likelihood calculation, I see several behaviors that I expect from the Pearlman paper:

    • R tends to one. For a pure AR process with p AR parameters, it achieves this limit in p steps. Basically, the break statement in the _likelihood function comes into effect after p iterations of the Pearlman algorithm steps.
    • L tends to the zero vector.
    • K tends to F.g. I check this by looking at abs(K - F.g) while calculating the likelihood.
    • After the warning about the negative value in the logarithm, the above limits are no longer obeyed.

I have also tried implementing a transformation of the ARMA parameters to prevent overflow/underflow, as recommended in

Jones, Richard H. "Maximum likelihood fitting of ARMA models to time series with missing observations." Technometrics 22.3 (1980): 389-395. Available from JSTOR.

This transformation seemed to no effect on the errors I observed.

  1. If the implementation is correct, then how do I handle the negative R values? The issue seems to arise when scipy.optimize returns a parameter vector that corresponds to a P matrix for which the top diagonal element is negative. Is the optimization routine supposed to be bounded to prevent negative R values? I have also tried using complex logarithms for negative values as well as changing all numpy dtype parameters to 'complex'. For example:

    def complex_log(val):
        '''
        Complex logarithm for negative values
    
        Returns log(val) + I*pi
        '''
        if val < 0.0:
            return complex(np.log(np.abs(val)), np.pi)
        return np.log(val) 
    

However, scipy.optimize cannot handle complex-valued functions, so this supposed fix has not worked so far. Any recommendations for preventing or handling these behaviors?

Thanks for reading this far. Any help is much appreciated.

Nick Settje
  • 3,022
  • 1
  • 11
  • 18

0 Answers0