MAP Expectation Maximization for mixture models

Question

I am trying to write down the MAP updates for the EM in case of mixtures of Bernoulli distributions.

I know that for ML estimates, we have:

E-step: compute P(Z|X,p,t)
M-Step: (p,t)<-argmax sum(over Z): p(Z|X,p,t)log p(X,Z|p,t)

where p are the vectors parameters for each class (K of them, each of size D, where K is the number of classes and D is the number of features) and t are the multinomial parameters for each class.

But how do I get MAP estimates? What would p(X) be...?

score 0 · Accepted Answer · answered Dec 27 '12 at 10:28

According to "Machine Learning - A Probabilistic Perspective" by Kevin P. Murphy page 350:

In the M step, we optimize the Q function (auxiliary function) with respect to theta:

theta^t = argmax_theta Q(theta,theta^{t-1})

which is the ML, and to perform MAP estimation instead we modify the M step as follows

theta^t = argmax_theta Q(theta,theta^{t-1})+log(p(theta))

theta is the parameters and theta^{t-1} is the previous approximation of the parameters and theta^t is the current.

Where Q is

Q(theta,theta^{t-1}) = E[logL(theta)|Data,theta^{t-1}]

The E step remains unchanged

So basically the difference between the ML and MAP is that you add log(p(theta)) inside argmax which is the log prior of your parameters.

For a specific example where the prior p(theta) is beta(alpha,beta) distributed I can refer to the last assignment answer here: assignment

It should be straight forward to to use your prior or leave it at a general prior.

MAP Expectation Maximization for mixture models

1 Answers1