I am going through hbka.pdf (WFST paper). https://cs.nyu.edu/~mohri/pub/hbka.pdf
Here the input label i, the output label o, and weight w of a transition are marked on the corresponding directed arc by i: o/w.
It does not make sense as to how a transducer can output the entire word at the initial transition itself. If the entire word was outputted at the final transition, it is sensible to me.
Later I saw the following in Page 19,
"In order for this transducer to efficiently compose with G, the output (word) labels must be placed on the initial transitions of the words; other locations would lead to delays in the composition matching, which could consume significant time and space."
ChatGPT answers that "placing the output labels on the initial transitions of the words in the Word bigram transducer enables more efficient composition with another transducer by optimizing the matching and combination of transitions."
But how exactly does it happen?
"Placing the output labels on the initial transitions ensures that the word transitions in the Word bigram transducer align directly with the transitions in the other transducer."
But still, the entire word which the Finite state transducer has to figure out using phones as input symbols like d,ey,dx,ax, how can it be the output of the initial transition?