Training of continuous densitiy HMMs with underflow in calculating densities

Question

I have implemented the evaluation and training algorithms for HMM following the Rabiner Tutorial for a single oberservation sequence (based on MFCC data). For the forward and backward algorithm I have also included the proposed scaling to handle underflow issues. For the Baum-Welch optimization I compute the probability accordingly as

log(P) = -sum(log(scaling coefficients))

The Baum-Welch optimization seems to work fine for a few iterations until I run into an underflow while calculating the densities. I use the Matlab build-in function mvnpdf for the calculation of the densities. After a few iterations Matlab starts to set the lowest densities to 0. For the following calculations in the Baum-Welch optimization the zero-entries in the Emission/PDF-Matrix yield NaNs and the further optimization fails.

I'd appreciate any help on how to circumvent or avoid this problem.

score 0 · Answer 1 · answered Dec 10 '21 at 02:06

Old question haha. But I had the same one today. Maybe someone else will too.

I think I managed to workaround this issue in my implementation. I did this by first checking if the calculated variances after the M step will cause an underflow, then reassigning those variances to the smallest possible variance that will not cause an underflow. For example, I figured out that for python's scipy.stats multivariate_normal.pdf implementation any data further than approximately 37.77 standard deviations from a mean will cause an underflow. So, after the M step I'm doing something along the lines of this:

observation_min, observation_max = (max(observations), min(observations))
aprx_max_std_dev = 37.7733
N = Number of latent states
M = Count of gaussians in mixture model
mu = means (shape: NxM)
U = variances (shape: NXM)
for n in range(N):
    for m in range(M):
        dist_to_min = mu[n,m] - observation_min
        dist_to_max = observation_max - mu[n,m]
        max_dist_from_mean = max(dist_to_min,dist_to_max)
        smallest_good_variance = square(max_dist_from_mean / aprx_max_std_dev) 
        if smallest_good_variances > U[n,m]
            U[n,m] = smallest_good_variance

Also, I ran into problems where the scale coefficients got too small sometimes. So next if a scale coefficient becomes smaller than some small value (I'm using <0.001), logically it seems like that normal distribution probably isn't important or contributing much to the whole mixture model. So I basically redistribute the coefficient scales so that they all have reasonable values that add up to one. So for example if I have a M=3 and I have coefficients which are (0.7998, 0.2, 0.0008) I steal some of the scale from the largest one and redistribute to (0.4, 0.2, 0.4).

Last of all to try prevent it from settling back to the same places after further iterations of EM. I randomly choose a new mean uniformly between observation_min and observation_max for the mean which had a very small scale.

Training of continuous densitiy HMMs with underflow in calculating densities

1 Answers1