1

I want to compute below formula in Matlab (E-step of EM for Multinomial Mixture Model),
enter image description here

g and θ are matrix , θ and λ have below constrains:
enter image description here
enter image description here
but count of m is more than 1593 and when compute product of θ, number get very small and Matlab save it with zero. Anyone can simplifying the g formula or use other tricks to solve this problem?

update:

data: data.txt (after downloads, change file extension to 'mat')

code:

function EM(data)
%% initialize
K=2;
[N M]=size(data);
g=zeros(N,K);
landa=ones(K,1) .* 0.5;
theta = rand(M, K);
theta = bsxfun(@rdivide, theta, sum(theta,1))';
%% EM
for i=1:10
%% E Step
    for n=1:N
        normalize=0;
        for k=1:K
            g(n,k)=landa(k) * prod(theta(k,:) .^ data(n,:));
            normalize=normalize + landa(k) * prod(theta(k,:) .^ data(n,:));
        end
        g(n,:)=g(n,:) ./ normalize;
    end
%% M Step 
    for k=1:K
        landa(k)=sum(g(:,k)) / N ;
        for m=1:M
            theta(k,m)=(sum(g(:,k) .* data(:,m)) + 1) / (sum(g(:,k) .* sum(data,2)) + M);
        end
    end
end

end

Micle
  • 159
  • 10
  • Did you try `vpa` to increase the precision? – Daniel Jan 24 '15 at 13:12
  • It's the same product in nominator and denominator. Divide both by it. – Daniel Jan 24 '15 at 13:15
  • @Daniel, I suspect that's a typo. The equation makes more sense if its j in the denominator instead of k. Its a normalisation. – A. Donda Jan 24 '15 at 15:41
  • Micle: Try to *add* the *logarithms* instead of multiplying the values, and then take the exponential function in the end to recover the product. – A. Donda Jan 24 '15 at 15:42
  • thanks,i used `vpa` like this `vpa(2e-400)' but not work, – Micle Jan 24 '15 at 15:49
  • @Micle: You are using vpa wrong. `vpa(2e-400)` is the same as `vpa(0)` because input is a double. Use `vpa('2e-400')` or `2*10^-vpa(400)`. – Daniel Jan 24 '15 at 15:58
  • thanks A. Donda, fix `k` in post, you say first logarithms from θ then compute multiplying and after all use exponential? – Micle Jan 24 '15 at 15:59
  • Daniel: but how to use `vpa` and prod with together: `for n=1:N normalize=0; for k=1:K g(n,k)=landa(k) * prod(theta(k,:) .^ data(n,:)); normalize=normalize + landa(k) * prod(theta(k,:) .^ data(n,:)); end g(n,:)=g(n,:) ./ normalize; end` – Micle Jan 24 '15 at 16:12
  • @Micle I suppose you are trying to maximize the function. As such you can take the logarithm of your function as suggested by A. Donda as it does not change the maximum. It will change your prod into a sum – ASantosRibeiro Jan 24 '15 at 17:30
  • @Daniel, i used `vpa` like `vpa('2e-400')` and result not goes to zero, but in this form `g` is array of `sym` and operations very very slow. – Micle Jan 24 '15 at 17:46
  • @ASantosRibeiro , logarithm of sum is not equal to sum of logarithm, so prod still needed. – Micle Jan 24 '15 at 18:16
  • Could you put your code and example data into your question? I tried to get it running but somewhere I use wrong input dimensions. – Daniel Jan 24 '15 at 18:27
  • Micle, it would really help if you could give us something we can try out ourselves: your whole implementation, along with some example data where that problem occurs. – A. Donda Jan 24 '15 at 18:42
  • ok, I'm add code and data to post. – Micle Jan 24 '15 at 18:45
  • Thanks! I'll be having a look. Could you also briefly explain the background: What are the data (looks like count?) what are the parameters you're trying to estimate? – A. Donda Jan 24 '15 at 18:56
  • 1
    text clustering using Expectation and Maximization(EM). rows of matrix are documents and columns are words,for example data(1,2) = 3 :: word 2 has 3 time repeated in document 1; θ is parameter of MultiNomial distribution. – Micle Jan 24 '15 at 19:07
  • Have a look at my answer. I don't know if the result of the algorithm makes sense now, but at least there are no underflow-zeros and nans any more. – A. Donda Jan 24 '15 at 19:30

1 Answers1

5

You can use computations on logarithms instead of the actual values to avoid underflow problems.

To start things, we slightly reformat the E step code:

for n = 1 : N
    for k = 1 : K
        g(n, k) = lambda(k) * prod(theta(k, :) .^ data(n, :));
    end
end
g = bsxfun(@rdivide, g, sum(g, 2));

So instead of accumulating the denominator in an extra variable normalize, we do the normalization in one step after both loops.

Now we introduce a variable lg with contains the logarithm of g:

for n = 1 : N
    for k = 1 : K
        lg(n, k) = log(lambda(k)) + sum(log(theta(k, :)) .* data(n, :));
    end
end
g = exp(lg);
g = bsxfun(@rdivide, g, sum(g, 2));

So far, nothing is achieved. The underflow is just moved from within the loop to the conversion from lg to g via the exponential afterwards.

But, in the next line there is the normalization step, which means that the correct value of g is not really necessary: All that is important is that different values have the correct ratios between them. This means we can divide all values that jointly enter a normalization by an arbitrary constant, without changing the end result. On the logarithmic scale, this means subtracting something, and we choose this something to be the arithmetic mean of lg (corresponding to the harmonic mean of g):

lg = bsxfun(@minus, lg, mean(lg, 2));
g = exp(lg);
g = bsxfun(@rdivide, g, sum(g, 2));

Via the subtraction, logarithmic values are moved from something like -2000, which doesn't survive the exponential, to something like +50 or -30. Values of g now are sensible, and can be easily normalized to reach the correct end result.

A. Donda
  • 8,381
  • 2
  • 20
  • 49