0

I'm using prosody.lab that uses HTK to build a language model. I'm trying to train this new model but I'm getting the following HTK error

ERROR [+1452]  ReadDictProns: word HAO' out of order in dict dictionary.dict 

Below is how my dictionary is ordered

EH EH0
ENSKA UH0 S K AH0
É'THO T OW0
HAO' HH AW0 OW0 Q
HENH HH AH0
KAHNAWAKE G AH0 N AH0 G EY0
KAHYÁ:TON' G AH0 Y AH0 D UH0 Q

So doing some research I came across a website that says this error is produced because:

Unix sort doesn't seem to match the sort HTK is looking for. Python's sort function seems to work. Numbers are sorted with '.' before 0, shorter before longer (1 < 1.0 < 10 < 100).

I assume the sorting is being affected by the fact I have special characters in my dictionary. Taking direction from the site I put those words into a list and sorted it using the python sort function to sort these words.

unsorted_list = ["EH", "ENSKA", "É'THO", "HAO'", "HENH", "KAHNAWAKE", "KAHYÁ:TON'"]
sorted(unsorted_list)

But the above code produces a list in the same order as my dictionary.

So then I tried putting my dictionary into another file (unsorted_dict.txt) and sorted that using the unix command sort:

$sort unsorted_dict.txt

But this also produces the same order and my dictionary.

EH
ENSKA
É'THO
HAO'
HENH
KAHNAWAKE
KAHYÁ:TON'

I've also tried switching HAO' and HENH in my dictionary file giving the following order:

EH
ENSKA
É'THO
HENH
HAO'
KAHNAWAKE
KAHYÁ:TON'

But this order produces the following error:

ERROR [+1452]  ReadDictProns: word HENH out of order in dict dictionary.dict
user2743
  • 1,423
  • 3
  • 22
  • 34

2 Answers2

0

Very probably, the sort order HTK expects is pretty naive and is based on the bytesequence used for the characters. In this case, É would come after all the characters without accent. So this order would be right:

EH
ENSKA
HENH
HAO'
KAHNAWAKE
KAHYÁ:TON'
É'THO

See http://www.tamasoft.co.jp/en/general-info/unicode.html for the characters ordered by their bytesequence.

Jonas
  • 956
  • 6
  • 12
0

ERROR [+1452] ReadDictProns: word HAO' out of order in dict dictionary.dict

this issue comes when "HAO'" word not exist in your input dictionary. you have to add in dictionary,that word with its cmu convertion.

that word will be contain in any of you train data sentences.

if its contain in you dictionary then error comes due to your yaml file configuration, there is set of pronunciation list, you have to add there if you made any new conversion for any word.

note: if you didn't sorted your dictionary, then please use this command and sort it first - "./sort.py lang.dict" or "./sort.py < lang.dict > new_lang.dict"

please refer this link for more info - https://github.com/prosodylab/Prosodylab-Aligner

Arif Rathod
  • 578
  • 2
  • 13