CMU Sphinx Adaptation for several words

Question

I'm trying to adapt wsj model to undersrand only 4 words from me, I have created a bash file and Ive tried near 20 times, but when I run and say "stop", it fails up to 90%. here's my bash file, please let me know, am I doing anything wrong or do I need to train it much more, like 100 times?

#!/bin/bash

for i in {1..4}
do 
       fn=`printf arctic_%04d $i`; 
       read sent; echo $sent; 
       rec -r 16000 -e signed-integer -b 16 -c 1 $fn.wav 2>/dev/null; 
done < arctic20.txt

sphinx_fe -argfile Model/feat.params \
   -samprate 16000 -c arctic20.fileids -di . -do . \
   -ei wav -eo mfc -mswav yes



bw/bw \
   -hmmdir Model \
   -moddeffn Model/mdef \
   -ts2cbfn .cont. \
   -feat 1s_c_d_dd \
   -cmn current \
   -agc none \
   -dictfn arctic20.dic \
   -ctlfn arctic20.fileids \
   -lsnfn arctic20.transcription \
   -accumdir .



cp -a Model/* Model.adapted

map_adapt/map_adapt \
    -meanfn Model/means \
    -varfn Model/variances \
    -mixwfn Model/mixture_weights \
    -tmatfn Model/transition_matrices \
    -accumdir . \
    -mapmeanfn Model.adapted/means \
    -mapvarfn Model.adapted/variances \
    -mapmixwfn Model.adapted/mixture_weights \
    -maptmatfn Model.adapted/transition_matrices

cp -r Model.adapted/* ~/NetBeansProjects/sphinx4-1.0beta6/models/acoustic/wsj


cp -r Model.adapted/* Model

And I'm running it over and over again Than I Clean and Build project, and run helloworld demo, I modified .gram file there. btw transcription: < s> stop < /s> (arctic_0001) < s> left < /s> (arctic_0002) < s> right < /s> (arctic_0003) < s> go < /s> (arctic_0004) I added spaces so that here it doesnt read as code here dictionary and fileids are also OK

Thanks

P.S. thanks to dariusz, but it still doesnt work

I'm pretty sure that for mono you should be using 16000 hz and 16 bit. — Owl, Feb 11 '22 at 03:34

Dariusz · Answer 1 · 2013-10-01T11:22:35.487

1

It is very difficult to determine what goes in such a complex process.

What you should do is set up a repeatable test case and use it to verify your progress. It should contain at least 100 test sentences (words, in your case). It can be done with sphinx, see this link

Only after you have the test ready, proceed to make changes to the acoustic model or grammar. Compare each change you make with the original (unmodified model) accuracy. Then you will know which steps are good and which are bad.

Another things are the training data - I may be wrong, but I think that such short one-word audio files are not the best for adapting the model. I would suggest using longer files, even if it means repeating the same word several times. Just make sure you speak exactly the right amount and make clear spaces between words.

edited Oct 01 '13 at 11:22

answered Oct 01 '13 at 11:16

Dariusz

21,561
9
74
114

You are wrong about short audio files unfortunately. The problem is not the size, if the final applicatoin will recognize short words, it's fine to adapt for short words too. – Nikolay Shmyrev Oct 04 '13 at 17:43
@NikolayShymyrev I am not saying adapting for short words is bad - I am saying that SphinxTrain may have trouble handling training from very short audio files. I have no proof, but that was my impression when I was working on adapting (and later creating) my own acoustic model. Longer training sequences (a whole sentence, for example) yielded better results. – Dariusz Oct 04 '13 at 21:38
That is only your impression. If you adapt to recognize short commands you need to use short commands in adaptation data. – Nikolay Shmyrev Oct 04 '13 at 21:55

score 1 · Answer 2 · answered Oct 03 '13 at 15:10

You shouldn't use MAP at all. MAP adaptation requires much larger amount of adapt data, as it modifies all parameters in the model. You will have better chance with MLLR, which is available in sphinx and here's the tutorial: http://cmusphinx.sourceforge.net/wiki/tutorialadapt

CMU Sphinx Adaptation for several words

2 Answers2