HMM - Training data and format

Question

I'm wanting to implement an HMM (Hidden Markov Model) in order to identify particular words. So far, I have managed to extract the Coefficients (MFCC) of the signal and wondered if this is ok data in order to train the HMM?

Also, is the format (below) correct for training the HMM?

The format:

Foreach sample, there are a sequence of MFCC Coefficients, I have provided two of these samples as an example...

-13.8033 0.645476 3.2174 -0.625136 -0.470134 -2.96368 0.701151 0.464246 1.1898 -1.88515 0.0805242 0.311573 0.732487

-19.4252 -5.65454 0.853437 0.317219 0.146167 -1.93742 0.381944 -2.01793 -0.561144 -0.896783 -0.105491 -1.06504 -0.797318

Hope someone can help :)

Those values look reasonable for MFCC coefficients, but it is hard to look at a pair of samples and know they are correct. My suggestion is to just train the model and see how it performs. — user1955591, Feb 09 '13 at 11:50
@user1955591 Could I use the viterbi algorithm to find the best state between (training, input) .. For example, if I am identifying "Yes" or "No" .. I train the HMM with these two words, then, I compare the training to the inputted values using the viterbi algorithm? Hope you reply. Thank you — Phorce, Feb 11 '13 at 00:11
Yes compute the best score through each HMM using the Viterbi algorithm and pick the best scoring HMM. Are you using a toolkit such as HTK? — Paul Dixon, Feb 11 '13 at 04:38
@PaulDixon Thank you for the reply. I am *kind of* understanding it a lot better. I'm using: http://www.cs.sjsu.edu/~stamp/RUA/HMM.pdf and https://github.com/liyanghua/A-Simple-implementation-of-Hidden-Markov-Models/blob/master/hmm.cpp to help me guide, however, I am not allowed to use toolkits — Phorce, Feb 11 '13 at 12:05

score 1 · Answer 1 · edited May 23 '17 at 11:49

1

You can have two approaches.

One is doing vector quantization on those vectors in order to convert the continuos MFCC vectors into discretes observations for the HMM.

Other is perform the training in HMM using a continuos approach. You can see more in this thread:

Simple speech recognition from scratch

edited May 23 '17 at 11:49

Community

1
1

answered Jun 20 '14 at 19:33

jessica

379
8
23

HMM - Training data and format

1 Answers1