simplifying the formula for EM

Question

I want to compute below formula in Matlab (E-step of EM for Multinomial Mixture Model),
enter image description here

g and θ are matrix , θ and λ have below constrains:
enter image description here

but count of m is more than 1593 and when compute product of θ, number get very small and Matlab save it with zero. Anyone can simplifying the g formula or use other tricks to solve this problem?

update:

data: data.txt (after downloads, change file extension to 'mat')

code:

function EM(data)
%% initialize
K=2;
[N M]=size(data);
g=zeros(N,K);
landa=ones(K,1) .* 0.5;
theta = rand(M, K);
theta = bsxfun(@rdivide, theta, sum(theta,1))';
%% EM
for i=1:10
%% E Step
    for n=1:N
        normalize=0;
        for k=1:K
            g(n,k)=landa(k) * prod(theta(k,:) .^ data(n,:));
            normalize=normalize + landa(k) * prod(theta(k,:) .^ data(n,:));
        end
        g(n,:)=g(n,:) ./ normalize;
    end
%% M Step 
    for k=1:K
        landa(k)=sum(g(:,k)) / N ;
        for m=1:M
            theta(k,m)=(sum(g(:,k) .* data(:,m)) + 1) / (sum(g(:,k) .* sum(data,2)) + M);
        end
    end
end

end

It's the same product in nominator and denominator. Divide both by it. — Daniel, Jan 24 '15 at 13:15
@Daniel, I suspect that's a typo. The equation makes more sense if its j in the denominator instead of k. Its a normalisation. — A. Donda, Jan 24 '15 at 15:41
Micle: Try to *add* the *logarithms* instead of multiplying the values, and then take the exponential function in the end to recover the product. — A. Donda, Jan 24 '15 at 15:42
@Micle: You are using vpa wrong. `vpa(2e-400)` is the same as `vpa(0)` because input is a double. Use `vpa('2e-400')` or `2*10^-vpa(400)`. — Daniel, Jan 24 '15 at 15:58
thanks A. Donda, fix `k` in post, you say first logarithms from θ then compute multiplying and after all use exponential? — Micle, Jan 24 '15 at 15:59
Daniel: but how to use `vpa` and prod with together: `for n=1:N normalize=0; for k=1:K g(n,k)=landa(k) * prod(theta(k,:) .^ data(n,:)); normalize=normalize + landa(k) * prod(theta(k,:) .^ data(n,:)); end g(n,:)=g(n,:) ./ normalize; end` — Micle, Jan 24 '15 at 16:12
@Micle I suppose you are trying to maximize the function. As such you can take the logarithm of your function as suggested by A. Donda as it does not change the maximum. It will change your prod into a sum — ASantosRibeiro, Jan 24 '15 at 17:30
@Daniel, i used `vpa` like `vpa('2e-400')` and result not goes to zero, but in this form `g` is array of `sym` and operations very very slow. — Micle, Jan 24 '15 at 17:46
@ASantosRibeiro , logarithm of sum is not equal to sum of logarithm, so prod still needed. — Micle, Jan 24 '15 at 18:16
Could you put your code and example data into your question? I tried to get it running but somewhere I use wrong input dimensions. — Daniel, Jan 24 '15 at 18:27
Micle, it would really help if you could give us something we can try out ourselves: your whole implementation, along with some example data where that problem occurs. — A. Donda, Jan 24 '15 at 18:42
Thanks! I'll be having a look. Could you also briefly explain the background: What are the data (looks like count?) what are the parameters you're trying to estimate? — A. Donda, Jan 24 '15 at 18:56
text clustering using Expectation and Maximization(EM). rows of matrix are documents and columns are words,for example data(1,2) = 3 :: word 2 has 3 time repeated in document 1; θ is parameter of MultiNomial distribution. — Micle, Jan 24 '15 at 19:07
Have a look at my answer. I don't know if the result of the algorithm makes sense now, but at least there are no underflow-zeros and nans any more. — A. Donda, Jan 24 '15 at 19:30

A. Donda · Accepted Answer · 2015-01-24T19:34:01.880

You can use computations on logarithms instead of the actual values to avoid underflow problems.

To start things, we slightly reformat the E step code:

for n = 1 : N
    for k = 1 : K
        g(n, k) = lambda(k) * prod(theta(k, :) .^ data(n, :));
    end
end
g = bsxfun(@rdivide, g, sum(g, 2));

So instead of accumulating the denominator in an extra variable normalize, we do the normalization in one step after both loops.

Now we introduce a variable lg with contains the logarithm of g:

for n = 1 : N
    for k = 1 : K
        lg(n, k) = log(lambda(k)) + sum(log(theta(k, :)) .* data(n, :));
    end
end
g = exp(lg);
g = bsxfun(@rdivide, g, sum(g, 2));

So far, nothing is achieved. The underflow is just moved from within the loop to the conversion from lg to g via the exponential afterwards.

But, in the next line there is the normalization step, which means that the correct value of g is not really necessary: All that is important is that different values have the correct ratios between them. This means we can divide all values that jointly enter a normalization by an arbitrary constant, without changing the end result. On the logarithmic scale, this means subtracting something, and we choose this something to be the arithmetic mean of lg (corresponding to the harmonic mean of g):

lg = bsxfun(@minus, lg, mean(lg, 2));
g = exp(lg);
g = bsxfun(@rdivide, g, sum(g, 2));

Via the subtraction, logarithmic values are moved from something like -2000, which doesn't survive the exponential, to something like +50 or -30. Values of g now are sensible, and can be easily normalized to reach the correct end result.

thank you so much.code work's perfectly and explains very well. — Micle, Jan 24 '15 at 19:45
Same here. Prod is usually solved with logarithms. It seems that it happened again. — ASantosRibeiro, Jan 27 '15 at 18:57

simplifying the formula for EM

1 Answers1