SSML Tags specifically the <media>
tag allows you to specific a begin
attribute which denotes at what point in time should the current text be placed in a sequence of text.
The value for begin can be offset from the end of the prior media tag.
<media begin='2s'><speak>Speak whatever you need here</speak</media>
or they can reference another media element for offset based on that media tag
<media xml:id='startnode'><speak>This starts at 0 seconds</speak</media>
<media begin='startnode.end+2s'><speak>This starts 2 seconds after the 'startnode' media element finishes</speak</media>
Its obvious, the first media tag will have a start time of 0 seconds.
What if I wished to start a specific media tag at an absolute value, say 4 seconds or 45 seconds, and only referenced to the start of the audio?