I have created my own CMUSphinx language model for Arabic language for a software that will be listening to a user and apply commands with my own dictionary that I've done it manually by hand, converted "arpa" language model type to "dmp" language model using the command sphinx_lm_convert -i ar.lm -o ar.lm.dmp
, so here is the files that i have so far:
- .txt (the commands text file)
- .wfreq (freq of words file)
- .idngram (ngram file)
- .dic (dictionary file)
- .phone (phonemes file)
- .lm (arpa language model file)
- .lm.dmp (Darpa Trigram dump language model file)
I then recorded my self of saying each word, each word has a its own .wav file and they are all in one folder that is separate from the folder where .dic, .txt, .lm exists.
My question is what is the next step as i was reading here http://cmusphinx.sourceforge.net/wiki/tutorial?
It says that Adapting existing acoustic model is the next step after building the language model, isn't it training the language model?
And if it is training, i have all the files required except the:
- .transcription
- .fileids
what should be inside these two files?
Thank