Speaker adaptation with HTK

Question

I am trying to adapt a monophone-based recogniser to a specific speaker. I am using the recipe given in HTKBook 3.4.1 section 3.6.2. I am getting stuck on the HHEd part which I am invoking like sp:

HHEd -A -D -T 1 -H hmm15/hmmdefs -H hmm15/macros -M classes regtree.hed monophones1eng

The error I end up with is as follows:
ERROR [+999] Components missing from Base Class list (2413 3375) ERROR [+999] BaseClass check failed

The folder classes contains the file global which has the following contents:
~b ‘‘global’’ <MMFIDMASK> * <PARAMETERS> MIXBASE <NUMCLASSES> 1 <CLASS> 1 {*.state[2-4].mix[1-25]}

The hmmdefs file within hmm15 had some mixture components (I am using 25 mixture components per state of each phone) missing. I tried to "fill in the blanks" by giving in mixture components with random mean and variance values but zero weigths. This too has had no effect.

The hmms are left-right hmms with 5 states (3 emitting), each state modelled by a 25 component mixture. Each component in turn is modelled by an MFCC with EDA components. There are 46 phones in all.

My questions are:
1. Is the way I am invoking HHEd correct? Can it be invoked in the above manner for monophones?
2. I know that the base class list (rtree.base must contain every single mixture component, but where do I find these missing mixture components?

NOTE: Please let me know in case more information is needed.

Edit 1: The file regtree.hed contains the following:

RN "models"
LS "stats_engOnly_3_4"
RC 32 "rtree"

Thanks,
Sriram

score 1 · Answer 1 · answered Jul 08 '11 at 09:24

1

They way you invoke HHEd looks fine. The components are missing as they have become defunct. To deal with defunct components read HTKBook-3.4.1 Section 8.4 page 137.

Questions: - What does regtree.hed contain? - How much data (in hours) are you using? 25 mixtures might be excessive.

You might want to use a more gradual increase in mixtures - MU +1 or MU +2 and limit the number of mixtures (a guess: 3-8 depending on training data amount).

answered Jul 08 '11 at 09:24

Neil

11
1

Thanks for the reply. I have added `regtree.hed` to the question (see Edit 1 above). Also, the data that the hmms have been trained on is considerable. I trained using the entire WSJ British English database (20k sentences), around 300 sentences of American English and some 500-600 sentences of Indian English (all of the above refers to accents which lends some words different pronunciations). I have been training the HMMs from the WSJ database onwards. The HMMs I got already had 25 mixtures. Is there any way to reduce them? – Sriram Jul 09 '11 at 08:42

Speaker adaptation with HTK

1 Answers1