I am trying to understand the internal workings of Kaldi, however is having trouble understanding the technical details of kaldi's doc.
I want to have a high-level understanding of various objects first in order to help digest what is presented. I would specifically like to know what the .tree, fina.mdl, and HCLG.fst files are, what is needed to generate them and how they are being used.
Vaguely I understand that (please correct me if I am wrong):
- final.mdl is the acoustic model and contains the probability of transitioning from one phone to another.
- HCLG.fst is a graph that given a sequence of phones it will generate the most likely word sequence based on the lexicon, grammar and language model.
- decoding-graph is the term for generating the HCLG.fst
- not quite sure what adding a self-loop is, is it similar to the Kleene operator?
- lattice contain alternative word-sequence for an utterance.
I understand there is a lot to cover but any help is appreciated!