2

In a typical clustering problem, the probability of a data point x is p(x) = sum_k p(k)p(x|k), where k is a latent variable specifying the cluster that x belongs to. We can use EM algorithm to maximize the log likelihood of the objective function for the training data set: sum_n log (sum_k p(k)(p(x|k))).

I wonder if EM algorithm can solve the problem with two sets of latent variables, i.e. p(x) = sum_k sum_l p(x|k, l)p(k)p(l)? If so, how can we do that?

What if all of the probability distributions are sigmoid functions?

TylerH
  • 20,799
  • 66
  • 75
  • 101
Lei Yu
  • 347
  • 1
  • 2
  • 8

1 Answers1

1

This should be just the straightforward application of the EM algorithm as a way of solving hidden data problems - the hidden data is the underlying value of k and l at each step. In the E step you work out the expected log likelihood, considering each possible value of the pair (k,l), using the probability of this, given the data and the current parameter settings as a weight. In the M state you find the parameters that maximise this expected log likelihood. This is very similar to just encoding the pair (k,l) as a single index, m, except that there is more structure in p(k)p(l) than there is in p(m), which will affect the M step very slightly.

If the probabilities are sigmoid - any any other probability distribution - the justification of the EM algorithm still holds: that each step increases or leaves unchanged the log likelihood. However you may find that the M-step becomes more expensive if the optimisation problem gets harder.

mcdowella
  • 19,301
  • 2
  • 19
  • 25
  • Hi mcdowella, many thanks for your answer. So what you mean is to combine p(k)p(l) as p(k,l), and treat p(k,l) as one hidden data? – Lei Yu Dec 17 '13 at 10:14
  • You can simplify the problem by combining k and l and treating (k,l) as a single hidden data. If you do this you have the choice of fitting p(k,l) as an arbitrary collection of probabilities - in which case the estimation will be exactly like the single parameter case - or as p(k,l)=p(k)p(l) in which case you have fewer parameters to fit and the estimation will be slightly different but should be fairly straightforward. A good check on EM is to check that the likelihood increases with each iteration until convergence. You can also test it on made-up data for which you know the right answer. – mcdowella Dec 17 '13 at 11:04