I am working on a project to build a Synthesizer for my local language using an HMM-based approach. So far, I have been able to generate a forced alignment file (aligned.mlf) as explained in the HTK Book. However, I fail to find any step by step instructions on how to build the synthesizer using HTS. What I have done is to download the sample Speaker Dependent Demo on the HTS website and trained that data. What I have in the voice folder is a cmu_us_arctic_slt.htsvoice
file. So my 2-part question is:
1) How do I use this file as a voice in Festival?
2) How can I generate the label
and utt
files needed to train my voice from the forced alignment file I have?
Any help will be greatly appreciated. Thanks.