Expectation vs. direct numerical optimization of likelihood function for estimating high-dimensional Markov-Switching /HMM model

Question

I am currently estimating a Markov-switching model with many parameters using direct optimization of the log likelihood function (through the forward-backward algorithm). I do the numerical optimization using matlab's genetic algorithm, since other approaches such as the (mostly gradient or simplex-based) algorithms in fmincon and fminsearchbnd were not very useful, given that likelihood function is not only of very high dimension but also shows many local maxima and is highly nonlinear. The genetic algorithm seems to work very well. However, I am planning to further increase the dimension of the problem. I have read about an EM algorithm to estimate Markov-switching models. From what I understand this algorithm releases a sequence of increasing log-likelhood values. It thus seems suitable to estimate models with very many parameters.

My question is if the EM algorithm is suitable for my application involving many parameters (perhaps better suitable as the genetic algorithm). Speed is not the main limitation (the genetic algorithm is altready extremely slow) but I would need to have some certainty to end up close to the global optimum and not run into one of the many local optima. Do you have any experience or suggestions regarding this?

You are talking about calibrating, i.e. estimating parameters of the model, rather than estimating a historical path of states based on a series of observations, right? — Charles Pehlivanian, Mar 12 '16 at 19:09
Yes, the aim is parameter estimation, but this will also yield filter and prediction probabilities (smoothed probabilities can then be obtained via Kim's smoothing algorithm). — InfiniteVariance, Mar 12 '16 at 19:27

score 0 · Answer 1 · answered Mar 12 '16 at 19:05

0

The EM algorithm finds local optima, and does not guarantee that they are global optima. In fact, if you start it off with a HMM where one of the transition probabilities is zero, that probability will typically never change from zero, because those transitions will appear only with expectation zero in the expectation step, so those starting points have no hope of finding a global optimum which does not have that transition probability zero.

The standard workaround for this is to start it off from a variety of different random parameter settings, pick the highest local optima found, and hope for the best. You might be slightly reassured if a significant proportion of the runs converged to the same (or to equivalent) best local optimum found, on the not very reliable theory that anything better would be found from at least the same fraction of random starts, and so would have showed up by now.

I haven't worked it out in detail, but the EM algorithm solves such a general set of problems that I expect that if it guaranteed to find the global optimum then it would be capable of finding the solution to NP-complete problems with unprecedented efficiency.

answered Mar 12 '16 at 19:05

mcdowella

19,301
2
19
25

Thank you for this answer. Zero probabilities in the transition probability matrix are unlikely (and I would not use these as starting values at all). I have already tried many random starting values through fmincon/fminsearchbnd but found most of them converging to quite different solutions (with somewhat similar likelihood function values). Does this point to a behavior of the likelihood function that will also cause the EM algorithm to converge to quite different solutions, or is it more 'robust' in this sense? – InfiniteVariance Mar 12 '16 at 20:30
The EM algorithm is still used and taught, so I presume that in the situations where it applies it is more effective than something like fmincon, which appears to be a general purpose optimizing routine which does not need, or take advantage of, provided derivatives. I have not compared the two but I would guess that the EM algorithm might allow you to do more runs in the same amount of time so even if there are just as many local minima you might have a better chance of finding a really good one. Two solutions with very similar likelihoods may really be the same with parameters permuted. – mcdowella Mar 12 '16 at 21:21
It is also worthwhile setting up a simulated problem to which you know the right answer and using this to test the program you have written to solve these problems, both for bugs (easy problems with obvious answers are good for this) and for statistical power. For the latter you could use the best answer retrieved from the real problem as the right answer of the simulation. See also https://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29#Parametric_bootstrap. – mcdowella Mar 13 '16 at 07:15
Thanks for coming back to this. I have already validated my existing code through simulation, but I guess the only way to find out how EM compares to the algorithms tried so far is actually implementing it and trying it out. Thanks again! – InfiniteVariance Mar 13 '16 at 13:03

Expectation vs. direct numerical optimization of likelihood function for estimating high-dimensional Markov-Switching /HMM model

1 Answers1