I am able to generate a wav file of "Mary had a little lamb" using the code below. But it fails when I try to generate an mp3
#https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started-text-to-speech?tabs=script%2Cwindowsinstall&pivots=programming-language-python
import azure.cognitiveservices.speech as speechsdk
languageCode = 'en-US'
ssmlGender = 'MALE'
voicName = 'en-US-JennyNeural'
speakingRate = '-5%'
pitch = '-10%'
voiceStyle = 'newscast'
azureKey = 'FAKE KEY'
azureRegion = 'FAKE REGION'
#############################################################
#audioOuputFile = './audioFiles/test.wav'
audioOuputFile = './audioFiles/test.mp3'
#############################################################
txt = 'Mary had a little lamb it\'s fleece was white as snow.'
txt+= 'And everywhere that Mary went, the lamb was sure to go,'
txt+= 'It followed her to school one day,'
txt+= 'That was against the rule,'
txt+= 'It made the children laugh and play,'
txt+= 'To see a lamb at school.'
head1 = f'<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="{languageCode}">'
head2 = f'<voice name="{voicName}">'
head3 =f'<mstts:express-as style="{voiceStyle}">'
head4 = f'<prosody rate="{speakingRate}" pitch="{pitch}">'
tail= '</prosody></mstts:express-as></voice></speak>'
ssml = head1 + head2 + head3 + head4 + txt + tail
print('this is the ssml======================================')
print(ssml)
print('end ssml======================================')
print()
speech_config = speechsdk.SpeechConfig(subscription=azureKey, region=azureRegion)
audio_config = speechsdk.AudioConfig(filename=audioOuputFile)
#HERE IS THE PROBLEM
#Without this statement everything works fine
#Can produce a wav file
speech_config.set_speech_synthesis_output_format(SpeechSynthesisOutputFormat["Audio16Khz128KBitRateMonoMp3"])
synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
synthesizer.speak_ssml_async(ssml)
Here is the console output:
(envo) D:\py_new\tts>python ttsTest3.py this is the ssml====================================== <mstts:express-as style="newscast">Mary had a little lamb it's fleece was white as snow.And everywhere that Mary went, the lamb was sure to go,It followed her to school one day,That was against the rule,It made the children laugh and play,To see a lamb at school.</mstts:express-as> end ssml======================================
Traceback (most recent call last): File "D:\py_new\tts\ttsTest3.py", line 45, in speech_config.set_speech_synthesis_output_format(SpeechSynthesisOutputFormat["Audio16Khz128KBitRateMonoMp3"]) NameError: name 'SpeechSynthesisOutputFormat' is not defined
(envo) D:\py_new\tts>
Note error: NameError: name 'SpeechSynthesisOutputFormat' is not defined
Compare with: Customize audio format
at:
It all works fine in Nodejs. But I need to be able to do it in Python as well.