I am trying to create a ASR system
with existing pre-trained models available as a sample. I got stuck in a place where how to add new words into that trained model, so that next time it will correctly return the word; Some sort of machine learning concept. Any ideas will be helpful.
Asked
Active
Viewed 2,853 times
4

Vamanan Rajadurai
- 19
- 5

Vipin YoYo
- 113
- 2
- 7
1 Answers
2
There are two things you might need:
Lexicon: Try to find something like
lexicon.txt
in your data folder, add your words and corresponding phone sequences in it, like:speech s p iy ch the dh ax the dh iy
Language Model: Find something like
XXX.lm
in your data folder, add your word in 1-gram with a probabiliy, like:\data\ ngram 1=200 ngram 2=4000 ... \1-grams -7.3241 the ...
After this, make the decoder HCLG.fst
again based on these 2 new files.
Note: Numbers in language will make the results of speech recognition different, you need to choose a proper number, or use toolkit srilm
to generate it by the text of your corpus.

coldsheep
- 41
- 3
-
This answer is the right way to go, any idea on how to add unigram to ARPA file? Manually? – xtluo Nov 27 '19 at 09:45