Microsoft Azure TTS Cognitive Service Voice Limit Issue

Question

I am very new to learn cognitive services of Text-to-Speech (TTS) of Microsoft Azure. I successfully able to convert the given text into an audio file by using TTS services of Azure.It works fine when I'm having a single voice element in my SSML XML document. The example of working SSML is;

<speak version="1.0" xml:lang="en-US">
  <voice xml:lang="en-US" xml:gender="Male" name="en-US-Jessa24kRUS"> 
       Hello, this is my sample text to convert into audio? 
  </voice>
</speak>

But, when I'm having multiple voice tags(on gender base), then it causes an error. The SSML of it is:

<speak version="1.0" xml:lang="en-US">
  <voice xml:lang="en-US" xml:gender="Male" name="en-US-Guy24kRUS"> What’s your name? </voice>
  <voice xml:lang="en-US" xml:gender="Female" name="en-US-Jessa24kRUS"> My name is Cindy Smith. Do you know John Silver?</voice>
  <voice xml:lang="en-US" xml:gender="Male" name="en-US-Guy24kRUS"> John and I are old friends. </voice>
  <voice xml:lang="en-US" xml:gender="Female" name="en-US-Jessa24kRUS"> John just joined our company as a salesperson. </voice>
  <voice xml:lang="en-US" xml:gender="Male" name="en-US-Guy24kRUS"> That’s good news. John has been a salesperson for chemical products for many years. </voice>
  <voice xml:lang="en-US" xml:gender="Female" name="en-US-Jessa24kRUS"> I head he really likes his new job.</voice>
</speak>

And the error is:

Response status code does not indicate success: 400 (SSML must contain a maximum of 5 voice elements. Actual 6.).

It'll be a great help for me if someone explain that why its limiting me to five voice tags, while there's no limitation mentioned in documentation.

I really got stuck on it and its really very insane. I didn't able to get any solution yet from any portal and my deadlines are near. :-( — Arsman Ahmad, Feb 03 '20 at 07:30
D'you have any idea, how can we contact Microsoft regarding to an issue!! — Arsman Ahmad, Feb 03 '20 at 07:33
I don't think you will be able to do something at Microsoft. You'd better split your dialog on chunks and then glue them together. Or try other apis like amazon polly. — Nikolay Shmyrev, Feb 03 '20 at 07:35
Actually a single `SSML` can only contains single `speak` tag and it returns a single-audio-file. If I'll generate multiple `SSML` then it'll return multiple audio files, which will not be a feasible case. — Arsman Ahmad, Feb 03 '20 at 07:38
You can use the feedback block at the end of the docs page that you mentioned to ask your question. But given the error that you have, I guess this is a real limitation (which the documentation should mention, I agree) — Nicolas R, Feb 04 '20 at 12:59

score 1 · Answer 1 · answered Feb 05 '20 at 08:26

1

This is a known settings due to latency. We've been aware of and working on removing this limitation. Hope we could complete the fix and deployment in this week, if things go smoothly, we may complete earlier.

answered Feb 05 '20 at 08:26

Ram

2,459
1
7
14

I have implement a bad-logic just because of this limitation. I'm creating multiple chunks of 5 voices and call API in loop. After getting multiple responses, I'm combining them and creating a single file. How rude am I !! – Arsman Ahmad Feb 05 '20 at 18:16
And please don't forget to ping me, once this limitation has been removed. Thank You – Arsman Ahmad Feb 06 '20 at 07:57
Hello @Ram, any news about the bug fixing!! – Arsman Ahmad Feb 12 '20 at 06:55
The deployment is completed. Please try it now. – Ram Feb 24 '20 at 09:18
Thank you for your response. I'll ping you once I test it. – Arsman Ahmad Feb 24 '20 at 10:42

Microsoft Azure TTS Cognitive Service Voice Limit Issue

1 Answers1