I am trying to generate the arpa format language model with the following commands:
text2wngram < weather.txt | grep -v "</s> <s>" > weather.wngram
wngram2idngram -vocab weather.vocab < weather.wngram > weather.idngram
idngram2lm -vocab_type 0 -idngram weather.idngram -vocab weather.vocab -arpa weather.lm
But second command wngram2idngram is not working and throwing following error:
text2idngram : Error : Must specify idngram file.
I change the parameters as follows and it works.
wngram2idngram -vocab weather.vocab -idngram weather.idngram < weather.wngram
My question is which one is correct? I am using cmulmtk Version 3.