Baum-Welch algorithm scaling issue - Matlab

Question

I am implementing Baum-Welch algorithm in Matlab from this wikipedia link : Baum-Welch algorithm. It is a part of my volatility forcasting in financial time series.
I have two questions :

1: in the last of update step in wikipedia page, It has been told that "These steps are now repeated iteratively until a desired level of convergence.".
So how to declare a condition to stop the loop? in addition what variables should be evaluated to see if they are acceptable?

2:If you pay attention in the Wikipedia's formulas for kesi:

kesi = (alhpa(i,t) * a(i,j) * beta(j,t+1) * b(j,t+1) ) / sum over states for alpha(states,T)

There are two factors that are scaled in the numerator (alpha and beta) and one in denominator ( just alpha ) . So they will dont cancel each others effect. I have implemented the equations (in a loop that repeats 20 times,for example) and I've done the scaling procedure. but the "transition probability matrix" and "initial distribution" and "emission matrix" gets NaN values!!!

I've read this question's answer Baum-Welch many possible observations. I've did scaling based on tutorial mentioned there.
Here is my code :

n = 3;      % number of sataes
T = 20;     % number of observations
%do some initializing things with random values and computing gamma,kesi...
index=20;
while index>=0
pi_star = gamma(:,1)';
P_star = zeros(n,n);
for i_2=1:n
    makhraj = sum(gamma(i_2,:));
    for j=1:n
        sorat = sum(kesi(i_2,j,:));
        P_star(i_2,j) =(sorat) / (makhraj) ;
    end
end
Q_star=zeros(n,T);
for t=1:T
    for i_2=1:n
        makhraj = sum(gamma(i_2,:));
        sorat=0;
        for h=1:T
            if Obs(h) == Obs(t)
                sorat = sorat + (gamma(i_2,t));
            end
        end
        Q_star(i_2,t) = (sorat)/(makhraj);
    end
end
%computing the forward probabilities
for i_2=1:n
    alpha(1,i_2) = pi_star(1,i_2)*Q_star(i_2,1);
end
for t=2:T
    for j=1:n
        alpha(t,j) = (alpha(t-1,:)*(P_star(:,j))) * Q_star(j,t) ;
    end
end
%%% scaling forward probabilities
for t=1:T
    c = 1 / sum(alpha(t,:));
    for i2=1:n
        alpha(t,i2) = alpha(t,i2) * c;
    end
end
%computes backward probabilitis
for t=(T-1):(-1) : 1
    rightVector=Q_star(:,t+1).* beta( t+1 ,:)' ;
    beta ( t , : ) = P_star* rightVector ;
end
%%% scaling backward probabilities
for t=1:T
    d = 1 / sum(beta(t,:));
    for i2=1:n
        beta(t,i2) = beta(t,i2) * d;
    end
end
%computing gamma variable
sigma_ab = zeros(1,T);
for t=1:T
    for j=1:n
        sigma_ab(1,t) = sigma_ab(1,t) + (alpha(t,j)*beta(t,j));
    end
end
for t=1:T
    for j=1:n
        gamma(j,t) = ((alpha(t,j)*beta(t,j))/sigma_ab(1,t));
    end
end
%computing kesi
makhraj_k = zeros(1,T);
for t=1:T
    for i_2=1:n
        makhraj_k(1,t) = makhraj_k(1,t) + alpha(t,i_2);
    end
end
for t=1:T-1
    for i_2=1:n
        for j=1:n
            kesi(i_2,j,t) = (alpha(t,i_2)*P_star(i_2,j)*beta(t+1,j)*Q_star(j,t+1))/makhraj_k(1,t);
        end
    end
end
index = index -1;
end %end while

So what should I do now for this scaling problem? is this NaN values because scaling issue or something else?
Thanks for your time.

It's been a some time since I last worked with HMMs, but you could take a look at Kevin Murphy's [BNT](https://github.com/bayesnet/bnt) and [HMM](http://people.cs.ubc.ca/~murphyk/Software/HMM/hmm.html) toolboxes. That would be the `dhmm_em` and `mhmm_em` functions that implement the EM algorithm (Baum-Welch) for discrete and continuous cases: https://github.com/bayesnet/bnt/tree/master/HMM. MATLAB also has [`hmmtrain`](http://www.mathworks.com/help/stats/hmmtrain.html) in the Statistics Toolbox for the discrete HMM only. You can see how they define the tolerance as stopping criteria. — Amro, Mar 26 '16 at 10:48
of course *Lawrence Rabiner*'s seminal paper is indispensable here. There's also a paper: "Numerically Stable Hidden Markov Model Implementation" on scaling factors and numerical stability of HMM implementations — Amro, Mar 26 '16 at 10:56
@Amro thank you I will take a look at that. but I should change this code to C and dont want to use pre-defined matlab functions. I've saw Kevin Murphy's BNT and HMM toolboxes before, in fact I found that a little complex and confusing and I can not find my first question's answer in that. — , Mar 26 '16 at 16:27
I have found answer to my second question. The `NaN` values appears when the observation sequence grows up. in my case after `T>100`. So i thought that the scaling procedures cant respond to my problem, and I used the logarithm space to compare the probabilities. here is the source : "Numerically Stable Hidden Markov Model Implementation" by Tobias P. M ann . I'm still looking for first question's answer. — , Apr 10 '16 at 09:55
that's the same paper I mentioned in my previous comment :) Probabilities are small numbers between 0 and 1, multiplying many probabilities together from long sequences can quickly underflow reaching zero with floating-point arithmetics, so you will likely get a NaN in intermediate calculations somewhere... That paper above suggests working in log-scale with the nice property of `log(a*b) = log(a) + log(b)` — Amro, Apr 10 '16 at 12:09
As for convergence, it is usually decided by checking if the log-likelihood of the data does not improve by much for some number of iterations. So if `LL_new - LL_old < threshold` is true for say 5 iterations then you stop. You decide an acceptable threshold and max number of iterations. — Amro, Apr 10 '16 at 12:12

Baum-Welch algorithm scaling issue - Matlab

0 Answers0