Absolute positioning start of the scentence in Azure speach synthesis using SSML

Question

I am using Azure - Speech synthesis, and I need scentences to start at exect times. Simplest option would be media as mentioned here SSML Speak tag Absolute value for begin attribute but Azure does not support media at the moment. so I tried Silence

<speak version=""1.0"" xmlns=""http://www.w3.org/2001/10/synthesis"" xmlns:mstts=""http://www.w3.org/2001/mstts""  xml:lang=""en-US""> 
<voice name=""en-US-JennyNeural"">
<mstts:silence  type=""Leading"" value=""{seconds}s""/>
this should happen after  {seconds} seconds 
</voice>
<voice name=""en-US-JennyNeural"">
<mstts:silence  type=""Leading"" value=""{seconds}s""/>
second round this should happen after  {2*seconds} seconds 
</voice>
</speak>

But silence has the maximum limit of 5 seconds. Is there any other option to achive something like this using azure?

I had the same problem. I wanted to have a voice over text to speech MP3 to merge with a video. The maximum limit is indeed 5 seconds. You can't change that. If the interval or the leading silence is more then 5 seconds , you have a problem. I can help you with this if this is the problem? I don't know how to change your question? This is a specific programming problem. — Peter Adriaenssens, Sep 01 '23 at 15:29

Absolute positioning start of the scentence in Azure speach synthesis using SSML

0 Answers0