0

As I said already sorry for the title. I have never worked with Azure API and have no idea what is wrong with the code, as I just copied from the documentation and put in my information.

Here is the code:

from azure.cognitiveservices.speech import AudioDataStream, SpeechConfig, SpeechSynthesizer, SpeechSynthesisOutputFormat
from azure.cognitiveservices.speech.audio import AudioOutputConfig

speech_config = SpeechConfig(subscription="ImagineHereAreNumbers", region="westeurope")


speech_config.speech_synthesis_language = "en-US"
speech_config.speech_synthesis_voice_name = "ChristopherNeural"

audio_config = AudioOutputConfig(filename=r'C:\Users\TheD4\OneDrive\Desktop\SpeechFolder\Azure.wav')

synthesizer = SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
synthesizer.speak_text_async("A simple test to write to a file.")

Well as I run this I get no error and in fact, get in my desired folder a .wav file, but this file has 0 bytes and it looks corrupted.

Now here is why I have no idea of what's wrong because if I remove this

speech_config.speech_synthesis_language = "en-US"
speech_config.speech_synthesis_voice_name = "ChristopherNeural"

So it becomes this

from azure.cognitiveservices.speech import AudioDataStream, SpeechConfig, SpeechSynthesizer, SpeechSynthesisOutputFormat
from azure.cognitiveservices.speech.audio import AudioOutputConfig

speech_config = SpeechConfig(subscription="ImagineHereAreNumbers", region="westeurope")


audio_config = AudioOutputConfig(filename=r'C:\Users\TheD4\OneDrive\Desktop\SpeechFolder\Azure.wav')

synthesizer = SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
synthesizer.speak_text_async("A simple test to write to a file.")

It now works all of the sudden, but with what I assume to be the basic/common voice.

So here is my question: how do I choose a voice that I want(btw is this one "en-US-JennyNeural" style="customerservice" or something among these lines)

Thank You in advance!

Bambi2k21
  • 15
  • 7

1 Answers1

0

ChristopherNeural is not a valid voice name. The actual name of the voice is en-US-ChristopherNeural.

speech_config.speech_synthesis_voice_name = "en-US-ChristopherNeural"

This is well-documented on the Language support page of the Speech services documentation.

For other, more fine-grained control over voice characteristics, you'll require the use of SSML as outlined in text-to-speech-basics.py.

esqew
  • 42,425
  • 27
  • 92
  • 132
  • I see, Thank You. Is there a way I can add the "customer server" or "newscast" styles? – Bambi2k21 Nov 29 '21 at 16:09
  • As I mentioned in my answer, the “styles” of a specific voice can be invoked with specially crafted SSML (I even linked an example from the official GitHub repository); is there something that’s unclear? – esqew Nov 30 '21 at 03:35
  • Yeah I have no idea how to work with SSML, XML. That is why I asked, to know if there is a way to do the same thing without them. – Bambi2k21 Nov 30 '21 at 04:26
  • There is not. Unfortunately, the Python library you’re using doesn’t support many of the functions provided by SSML, including the voice style of the generated speech. – esqew Nov 30 '21 at 04:41
  • That's a bummer, is there a tutorial or something like that, that I can use for linking this sort of xml to python, because I have xml code that I got piece by piece from the docs(https://paste.pythondiscord.com/viyinabiku.xml), I just have no idea how to implement it into my python code. Or if you know how please help me. – Bambi2k21 Nov 30 '21 at 04:57