3

I am trying to write down the MAP updates for the EM in case of mixtures of Bernoulli distributions.

I know that for ML estimates, we have:

E-step: compute P(Z|X,p,t)
M-Step: (p,t)<-argmax sum(over Z): p(Z|X,p,t)log p(X,Z|p,t)

where p are the vectors parameters for each class (K of them, each of size D, where K is the number of classes and D is the number of features) and t are the multinomial parameters for each class.

But how do I get MAP estimates? What would p(X) be...?

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
Jessica
  • 89
  • 1
  • 6

1 Answers1

0

According to "Machine Learning - A Probabilistic Perspective" by Kevin P. Murphy page 350:

In the M step, we optimize the Q function (auxiliary function) with respect to theta:

theta^t = argmax_theta Q(theta,theta^{t-1})

which is the ML, and to perform MAP estimation instead we modify the M step as follows

theta^t = argmax_theta Q(theta,theta^{t-1})+log(p(theta))

theta is the parameters and theta^{t-1} is the previous approximation of the parameters and theta^t is the current.

Where Q is

Q(theta,theta^{t-1}) = E[logL(theta)|Data,theta^{t-1}]

The E step remains unchanged

So basically the difference between the ML and MAP is that you add log(p(theta)) inside argmax which is the log prior of your parameters.

For a specific example where the prior p(theta) is beta(alpha,beta) distributed I can refer to the last assignment answer here: assignment

It should be straight forward to to use your prior or leave it at a general prior.

SlimJim
  • 2,264
  • 2
  • 22
  • 25