Some background: In trying to build a unit selection voice I followed the steps here: https://github.com/CSTR-Edinburgh/CSTR-Edinburgh.github.io/blob/master/_posts/2016-8-21-Multisyn_unit_selection.md and used a voice definition from here: https://raw.githubusercontent.com/CSTR-Edinburgh/merlin/master/egs/hybrid_synthesis/s1/voice_definition_files/unit_selection/cstr_us_awb_arctic_multisyn.scm. Unfortunately, the wavs were too noisy so I ended up hand-labelling them and skipping the automatic labelling process.
The voice is ok now but still needs some work. One error that occurs constantly is that festival reports "Missing diphone" for any pause to phone transition, e.g.:
festival> (utt.relation.print (SayText "I can say anything I want.") 'Unit)
Missing diphone: #_ay
diphone still missing, backing off: #_ay
backed off: #_ay -> #_ax
diphone still missing, backing off: #_ax
backed off: #_ay -> #_#
diphone still missing, backing off: #_#
backed off: #_ay ->
Missing diphone: ey_eh
Interword so inserting silence.
diphone still missing, backing off: ey_#
backed off: ey_eh -> ax_#
diphone still missing, backing off: ax_#
backed off: ey_eh -> #_#
diphone still missing, backing off: #_#
backed off: ey_eh ->
Missing diphone: #_eh
diphone still missing, backing off: #_eh
backed off: #_eh -> #_ax
diphone still missing, backing off: #_ax
backed off: #_eh -> #_#
diphone still missing, backing off: #_#
backed off: #_eh ->
Missing diphone: t_#
diphone still missing, backing off: t_#
backed off: t_# -> #_#
diphone still missing, backing off: #_#
backed off: t_# ->
I tried replacing sil
and sp
(from the automatic process) in the labels with pau
and h#
(in order to correspond with the silences used in festival/lib/radio_phones.scm), and I also tried replacing them with just #
but this didn't change anything. The source wav/labs definitely contain the transitions above (e.g. several start with "I can") but festival never seems to use these.
How can I get festival to use the pause to phone transitions in the source data?
Thanks!