Do I need to add updated phoneme sequence of words to .dict file while adapting AM using cmusphinx?

Question

I am trying to adapt en-us acoustic model with indian english accent recordings. Since many words are pronounced in different accent, do I need to add the updated phoneme representation of words? Currently I am following this link: https://cmusphinx.github.io/wiki/tutorialadapt/#accumulating-observation-counts and here nothing is mentioned about updating your .dict file.

PS: Should I add new words directly in the dictionary?

score 0 · Accepted Answer · answered Apr 10 '19 at 09:03

0

There is Indian English model in downloads, you should use it instead. It comes with Indian English dictionary.

answered Apr 10 '19 at 09:03

Nikolay Shmyrev

24,897
5
43
87

Thanks, will check it. What if, in that model also I want to add some new words? – Sumit Jangra Apr 10 '19 at 11:59
One more thing, can we train a model on en-us children data and then use it for en-in children. Would it work with good accuracy or do we need en-in children data only? Thanks in advance. – Sumit Jangra Apr 11 '19 at 05:14
For the best accuracy you have to train on en-in children data. – Nikolay Shmyrev Apr 11 '19 at 07:57
Suppose I just want recognition of some 50 basic words, how much data is required? Right now I only have 50 recordings for each word. I have two options 1) Adapt the model 2) Train a new model, but I don't know if I have enough data to train and build a model. – Sumit Jangra Apr 16 '19 at 11:09
You have to train to recognize children voices. You need 50 hours of speech to train a good system. – Nikolay Shmyrev Apr 16 '19 at 13:08

Do I need to add updated phoneme sequence of words to .dict file while adapting AM using cmusphinx?

1 Answers1