I am using Azure - Speech synthesis, and I need scentences to start at exect times. Simplest option would be media as mentioned here SSML Speak tag Absolute value for begin attribute but Azure does not support media at the moment. so I tried Silence
<speak version=""1.0"" xmlns=""http://www.w3.org/2001/10/synthesis"" xmlns:mstts=""http://www.w3.org/2001/mstts"" xml:lang=""en-US"">
<voice name=""en-US-JennyNeural"">
<mstts:silence type=""Leading"" value=""{seconds}s""/>
this should happen after {seconds} seconds
</voice>
<voice name=""en-US-JennyNeural"">
<mstts:silence type=""Leading"" value=""{seconds}s""/>
second round this should happen after {2*seconds} seconds
</voice>
</speak>
But silence has the maximum limit of 5 seconds. Is there any other option to achive something like this using azure?