I'm using prosody.lab that uses HTK to build a language model. I'm trying to train this new model but I'm getting the following HTK
error
ERROR [+1452] ReadDictProns: word HAO' out of order in dict dictionary.dict
Below is how my dictionary is ordered
EH EH0
ENSKA UH0 S K AH0
É'THO T OW0
HAO' HH AW0 OW0 Q
HENH HH AH0
KAHNAWAKE G AH0 N AH0 G EY0
KAHYÁ:TON' G AH0 Y AH0 D UH0 Q
So doing some research I came across a website that says this error is produced because:
Unix sort doesn't seem to match the sort HTK is looking for. Python's sort function seems to work. Numbers are sorted with '.' before 0, shorter before longer (1 < 1.0 < 10 < 100).
I assume the sorting is being affected by the fact I have special characters in my dictionary. Taking direction from the site I put those words into a list and sorted it using the python sort
function to sort these words.
unsorted_list = ["EH", "ENSKA", "É'THO", "HAO'", "HENH", "KAHNAWAKE", "KAHYÁ:TON'"]
sorted(unsorted_list)
But the above code produces a list in the same order as my dictionary.
So then I tried putting my dictionary into another file (unsorted_dict.txt
) and sorted that using the unix command sort
:
$sort unsorted_dict.txt
But this also produces the same order and my dictionary.
EH
ENSKA
É'THO
HAO'
HENH
KAHNAWAKE
KAHYÁ:TON'
I've also tried switching HAO'
and HENH
in my dictionary file giving the following order:
EH
ENSKA
É'THO
HENH
HAO'
KAHNAWAKE
KAHYÁ:TON'
But this order produces the following error:
ERROR [+1452] ReadDictProns: word HENH out of order in dict dictionary.dict