5

I'm implementing the forward algorithm for HMMs to calculate the probability of a given HMM emitting a given observation sequence. I'd like my algorithm to be robust to underflow. I can't work in log-space because the forward algorithm requires the multiplication AND addition of probabilities. What is the best way to avoid underflow?

I've read some sources about this but the best suggestion I get is scaling the probabilities at each time step Section 6 Here. By the end of the algorithm you won't be left with the exact probability you want (of the observation sequence). Also, unless I'm mistaken, if you scale probabilities at each time step as proposed in the above reference, you can't do a meaningful comparison of the probability of a given observation sequence having come from two different HMMs (to figure out which one is more likely to have output the sequence). Any suggestions?

akobre01
  • 777
  • 1
  • 10
  • 22

1 Answers1

7

In equation 32 at the end of your reference you multiply every probability value alpha_t(i) by C_t. So at the end you have multiplied your final probabilities by the product of all the C_t. You can keep track of all of this by keeping track of the sum of log(C_t). Then at the end you can work out log(alpha_t(i)) - SUM_(j <= t)log(C_j) which will give you the log probability of the final alpha_t(i), or log(SUM_t alpha_t(i)) - SUM_(j <= t)log(C_j) which will give you the log probability of the entire sequence.

mcdowella
  • 19,301
  • 2
  • 19
  • 25
  • I may be resurrecting the dead here, but the forward induction step, depends on `i`, so in your dynamic programming matrix, you have `alpha[i][t]` at the `t + i * lenght of sequnece` th step, but don't yet have the `alpha[i+1][t]`, which makes it impossible to calculate the $C_t$, or do you just use forward algorithm and at the end scale it? – Vahagn Tumanyan Apr 19 '17 at 10:59
  • The calculations work out alpha[i][t+1] (for all values of i) using alpha[i][t] (for all i) and other information calculated for the time t. The alpha[i][t] values here will be scaled by C_t when overflow is a worry. After calculating alpha[i][t+1] we can use these values to calculate C_{t+1} and then use that to calculate the scaled values of alpha[i][t+1]. C_{t+1} is the last of the unscaled values calculated and is not needed until it is used to scale the alpha values. (Remember that i varies in an inner loop, and t in an outer loop). – mcdowella Apr 19 '17 at 17:14