3

I read many articles on this but I just do not understand how I have to proceed.

I'm trying to build a basic Speech recognition system using the MFCC features to the HMM , I'm using the data available here. I'm using Matlab to do this.

So far I have extracted the MFCC vectors from the speech files using this library. What I do not understand is how do I use these features for HMM.

Can you please explain how do I train the HMM. I'm using the hmm implementation found in matlab. Please do not refer me to other libraries as i am actually trying to understand how hmm's work.

  • How do I initialize the transition and emission matrices?

  • I'm supposing each state emits a particular phoneme in the word, So to train the HMM how are we supposed to pass the MFCC vectors?

  • What are the steps I should take to train the HMM?

The matlab implementation functions of the HMM are given here

Edit: it's been a long time, but I suppose the question is still relevant by the amount of views it hit, I did solve this the code can be found on my GitHub

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
Josyula Krishna
  • 1,075
  • 1
  • 11
  • 22

1 Answers1

1

You can not use this hmm to train speech HMM from MFCC vector. This framework supports number sequences only, it does not support feature vectors. It is a simple discrete HMM toolbox.

You have to use speech-oriented library like this one:

http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • Thanks for the advice i'm switching to murphy's toolbox,As far i know, I'm considering the phonemes are the outputs of each state in HMM, i don't understand how the MFCC vectors come into play here. Can you please explain, how exactly do we use these features to train the HMM? – Josyula Krishna Jan 27 '15 at 13:42
  • 1
    You can read Rabiner's HMM tutorial to get a clear picture of HMM http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf Features are input to HMM algorithms. You input feature array and the algorithm assigns features to output labels (phones) and gives you the probability of such assignment. Alignment is done in unsupervised way. – Nikolay Shmyrev Jan 27 '15 at 21:29