EM algorithm for two sets of latent variables

Question

In a typical clustering problem, the probability of a data point x is p(x) = sum_k p(k)p(x|k), where k is a latent variable specifying the cluster that x belongs to. We can use EM algorithm to maximize the log likelihood of the objective function for the training data set: sum_n log (sum_k p(k)(p(x|k))).

I wonder if EM algorithm can solve the problem with two sets of latent variables, i.e. p(x) = sum_k sum_l p(x|k, l)p(k)p(l)? If so, how can we do that?

What if all of the probability distributions are sigmoid functions?

score 1 · Accepted Answer · answered Dec 17 '13 at 05:42

This should be just the straightforward application of the EM algorithm as a way of solving hidden data problems - the hidden data is the underlying value of k and l at each step. In the E step you work out the expected log likelihood, considering each possible value of the pair (k,l), using the probability of this, given the data and the current parameter settings as a weight. In the M state you find the parameters that maximise this expected log likelihood. This is very similar to just encoding the pair (k,l) as a single index, m, except that there is more structure in p(k)p(l) than there is in p(m), which will affect the M step very slightly.

If the probabilities are sigmoid - any any other probability distribution - the justification of the EM algorithm still holds: that each step increases or leaves unchanged the log likelihood. However you may find that the M-step becomes more expensive if the optimisation problem gets harder.

Hi mcdowella, many thanks for your answer. So what you mean is to combine p(k)p(l) as p(k,l), and treat p(k,l) as one hidden data? — Lei Yu, Dec 17 '13 at 10:14
You can simplify the problem by combining k and l and treating (k,l) as a single hidden data. If you do this you have the choice of fitting p(k,l) as an arbitrary collection of probabilities - in which case the estimation will be exactly like the single parameter case - or as p(k,l)=p(k)p(l) in which case you have fewer parameters to fit and the estimation will be slightly different but should be fairly straightforward. A good check on EM is to check that the likelihood increases with each iteration until convergence. You can also test it on made-up data for which you know the right answer. — mcdowella, Dec 17 '13 at 11:04

EM algorithm for two sets of latent variables

1 Answers1