In a typical clustering problem, the probability of a data point x is p(x) = sum_k p(k)p(x|k)
, where k
is a latent variable specifying the cluster that x belongs to. We can use EM algorithm to maximize the log likelihood of the objective function for the training data set: sum_n log (sum_k p(k)(p(x|k)))
.
I wonder if EM algorithm can solve the problem with two sets of latent variables, i.e.
p(x) = sum_k sum_l p(x|k, l)p(k)p(l)
? If so, how can we do that?
What if all of the probability distributions are sigmoid functions?