1

I am trying to generate the arpa format language model with the following commands:

text2wngram < weather.txt | grep -v "</s> <s>" > weather.wngram
wngram2idngram -vocab weather.vocab < weather.wngram > weather.idngram 
idngram2lm -vocab_type 0 -idngram weather.idngram -vocab weather.vocab -arpa weather.lm

But second command wngram2idngram is not working and throwing following error:

text2idngram : Error : Must specify idngram file.

I change the parameters as follows and it works.

wngram2idngram -vocab weather.vocab -idngram weather.idngram < weather.wngram

My question is which one is correct? I am using cmulmtk Version 3.

Stefanus
  • 1,619
  • 3
  • 12
  • 23
g10dras
  • 399
  • 2
  • 11

1 Answers1

1

Second variant is correct.

At the same time, we recommend to use SRILM.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87