On Chrome linux code such as the following
speak('<?xml version="1.0"?><speak>Intro <break time="200ms"/>the rest.</speak>');
has the TTS engine reading out the xml stuff. On Android browsers it understands it and introduces a break.
I don't want to browser sniff, but can't see what test I should use to take advantage of SSML where it is understood, but serve something plainer where it isn't