0

I am trying to adapt en-us acoustic model with indian english accent recordings. Since many words are pronounced in different accent, do I need to add the updated phoneme representation of words? Currently I am following this link: https://cmusphinx.github.io/wiki/tutorialadapt/#accumulating-observation-counts and here nothing is mentioned about updating your .dict file.

PS: Should I add new words directly in the dictionary?

Sumit Jangra
  • 127
  • 1
  • 15

1 Answers1

0

There is Indian English model in downloads, you should use it instead. It comes with Indian English dictionary.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • Thanks, will check it. What if, in that model also I want to add some new words? – Sumit Jangra Apr 10 '19 at 11:59
  • One more thing, can we train a model on en-us children data and then use it for en-in children. Would it work with good accuracy or do we need en-in children data only? Thanks in advance. – Sumit Jangra Apr 11 '19 at 05:14
  • For the best accuracy you have to train on en-in children data. – Nikolay Shmyrev Apr 11 '19 at 07:57
  • Suppose I just want recognition of some 50 basic words, how much data is required? Right now I only have 50 recordings for each word. I have two options 1) Adapt the model 2) Train a new model, but I don't know if I have enough data to train and build a model. – Sumit Jangra Apr 16 '19 at 11:09
  • You have to train to recognize children voices. You need 50 hours of speech to train a good system. – Nikolay Shmyrev Apr 16 '19 at 13:08