3-state phone model in Hidden Markov Model (HMM)

Question

I want to ask regarding the meaning of 3-state phone model in HMM. This case is based on the theory of HMM in speech recognition system. So the example is based on the acoustic modeling of the speech sounds in HMM.

I get this example picture from a journal paper: http://www.intechopen.com/source/html/41188/media/image8_w.jpg

Figure 1: 3-State HMM for the sound /s/

So, my question is:

what is it mean by 3 state?
what actually S1, S2 & S3 mean? (I know it is state but it represent what?)
How to represent the /s/ sound in this HMM state?
Why is it 3? what happen if we have 4, 5 or more state?
If the sound of /s/ is only a simple sound of consonant "s/", what is the used of the state and transition represent?

Do you guys have simple explanation with example (graphic analogy) of this theory?

Thank you

Nick

Belongs to http://dsp.stackexchange.com – Nikolay Shmyrev Jan 23 '15 at 15:55 — Nikolay Shmyrev, Jan 23 '15 at 15:55

score 4 · Accepted Answer · answered Jan 23 '15 at 15:50

what is it mean by 3 state?

The model that describes the phone S consist of tree states - S1, S2 and S3.

what actually S1, S2 & S3 mean? (I know it is state but it represent what?)

S1 represents probability distribution of feature vector in the beginning of phone S, S2 in the middle, S3 in the end. Probability distribution is essentially most probable value of the feature vector (how does this part of the phone sounds) and the variation (in what ranges it varies).

How to represent the /s/ sound in this HMM state?

S sounds is represented by a whole HMM, not just a single state.

Why is it 3? what happen if we have 4, 5 or more state?

In continuous speech recognition phone acoustics is affected by preceding phoneme and succeeding phoneme. For that reason its more precise to split each phone on 3 parts - transition from previous phone in the beginning, stable middle and transition to the next phone in the end. If phone would be isolated and stable 1 state would be enough. It is also possible to use 5 states for single phone in continuous speech, but it doesn't greatly improve the accuracy.

If the sound of /s/ is only a simple sound of consonant "s/", what is the used of the state and transition represent?

See above. Transition represents probability of moving from one state to another, essentially it models the length of the phone.

3-state phone model in Hidden Markov Model (HMM)

1 Answers1

Linked