Hidden Markov Models - Identifying Phonemes

Question

I'm developing a project that identifies Phonemes to be able to identify whether someone is saying either "Yes" or "No".

So far in the project, I have used Zero-crossings to identify what the person is saying, this works really well and seems simple enough to understand. The project, however, needs a few enhancements and has to be developed using a Hidden Markov Model.

My question is this:

I want to develop a Hidden Markov Model, without erasing the work that I have already completed. I.e. I strip the data that do not warrant consideration by counting the number of zero-crossings as well as the summation of the blocks.

I do not understand what data I would need to train the HMM in order to be able to identify these Phonemes. E.g.

With Zero-crossings I have identifies that:

Yes - Zero-crossings start low and then the value increases

No - Zero-crossings start low and then do not increase with value.

Could I train my HMM algorithm so that it interprets these values?

Or could anyone suggest a method of which I can train the HMM to be able to identify the word that is inputted in the sample?

Hope someone can help :)!

HMM seems like overkill for your situation. Have you thought about, say, logistic regression? — Bjorn Roche, Nov 16 '12 at 17:54
@Bjorn Roche - hey it has to be an HMM .. Its what my project is based on — Phorce, Nov 16 '12 at 21:17

score 2 · Accepted Answer · answered Nov 17 '12 at 07:49

2

Could I train my HMM algorithm so that it interprets these values?

Yes, definitely

Or could anyone suggest a method of which I can train the HMM to be able to identify the word that is inputted in the sample?

You just need to put zero crossing rate in a feature file together with MFCC features like 14th feature and use any standard HMM training toolkit like CMUSphinx or HTK to train the HMM and decode using it. For more information see

http://cmusphinx.sourceforge.net/wiki/mfcformat

or

http://speech-research.com/htkSearch/index.php?ID=297039

http://speech-research.com/SRTxt2User/index.html

answered Nov 17 '12 at 07:49

Nikolay Shmyrev

24,897
5
43
87

thanks for your reply :) so just to confirm, for yes the values from counting the zero-crossings are all low values, so I could train the HMM so that it could start with low values then do increase whereas 'no' then values could start low and then do not increase, this would work? Also is there anyway I can expand this and use a DFT to train the hmm? Thanks again! – Phorce Nov 17 '12 at 12:18
You can use HMM to detect the time patterns in zero crossing rate and distinguish yes and no with zero crossing feature value. I'm not sure what you mean by "expand this". Please elaborate. DFT and HMM training are not really related things. – Nikolay Shmyrev Nov 17 '12 at 18:35
Hello, I just got back today (Was working on my tablet so my response was a little out). Ok, so, basically, the zero-crossing result has a pattern (increase for yes, stay stagnant for no) BUT if I transform these values into a different time-domain FFT could I then train the HMM with these values? Instead of having values like (12, 14, 53, 64) it would be trained using the FFT values. Thanks :) – Phorce Nov 19 '12 at 15:56
You do not need to transform it to time domain with FFT, FFT make sense only for audio data. You can use zero-crossing rate *together* with FFT-like data like Mel-cepstrum to improve detection. Or you can just use zero-crossing alone, it's just a task to put convert zero crossing rate data to a standard feature file format. – Nikolay Shmyrev Nov 19 '12 at 18:44