0

I'm trying to build a new voice for MaryTTS in German for a while now, but didn't succeed so far. I followed a tutorial (https://github.com/marytts/marytts/wiki/HMMVoiceCreation) and tried to understand each step. No matter what I do, I get stuck at step 14 (HMMVoiceMakeVoice), the error being:

ERROR [+2121] HInit: Too Few Observation Sequences

which usually means, that the tested phone (en9 in this example) is not found within my data set.

After changing the locale, the same error happend on the phone "de27" as Nikolay Shmyrev pointed out.

I doubt that though, since I use about 500 Audio files, which have a length of at least 5 sec, so a total well over an hour of footage.

In fact, I skipped the "en9" phone, since I don't know what exactly is represented by it. The next one to fail was "oI", which I located manually about ten times in the first few audio files.

I think it has to do with the automatic labeling to not work properly (step 2-4), but I don't know, what I can do, to get a better result?

Edit: I uploaded all the files I get until this step, which can be inspected on this shared google drive. Note, that I could not, for copyright reasons, upload the wav folder. In the logs directory, you can find the logs after each step. I couldn't find any problems there, but maybe someone will.

I do not completely understand the structure of the generated data, but I thought changing the MARYBASE/mary/trickyPhones.txt and running the make tools again would be enough to change the map name from "tS" to "Z" which sounds about the same in German. But the HMMVoiceMakeVoice still results in the same output.

Poehli
  • 307
  • 4
  • 16
  • You need to study the logs and data to get more details. You need to share the logs to get help, ideally you'd want to share the whole voice folder on google drive. Neither "en9" nor "ol" are proper phones in the German phoneset, you should have some issue with the phoneset or dictionary, probably missing spaces. You can find German phoneset here: https://github.com/marytts/marytts-lexicon-de/blob/master/modules/de/lexicon/allophones.de.xml – Nikolay Shmyrev Jan 16 '18 at 01:01
  • I did, as you asked. What dictionary do you mean? The transcription file, I created? If so, I don't see anything wrong with it. Also I don't know, where I can see my phoneset; I thought I used the German one, because the marytts-server needs to be running. – Poehli Jan 16 '18 at 13:57
  • Ok, so it warns you about "de27", not "en9". "de27" is a map name from the phone "tS". This phone is indeed rare. There are only 7 instances of this phone in your training data, so it is not quite sufficient. I would probably replace phone "tS" with some other phone in the lexicon. You can also add some more examples with "tS". – Nikolay Shmyrev Jan 16 '18 at 15:10
  • The total length of your corpora is just 1044 seconds or 17 minutes. More data would be nice to start with. – Nikolay Shmyrev Jan 16 '18 at 15:20
  • I was wondering why it changed from "en9" to "de27" and I think I didn't change the locale until later that day. And I know there is not enough audio file to get a good result. I just wanted to test first, before I put more work into that. If I pronounce that phone, it kinda sounds like an "z", but I haven't been able to locate the correct lexicon you were talking about. More to that in the question. – Poehli Jan 16 '18 at 18:58

0 Answers0