Using the synthesizeToFile method of Android TextToSpeech, how are we to know what file format (WAV, MP3, OGG), and/or attributes (sample rate, bit depth, etc.) the resulting file will be?
I can't find an explicit standard in the documentation... it doesn't even promise any particular file format such as WAV.
Is this simply up to the speech engine to implement however they choose?
What if we want to do something with the result, like calculate the duration of the file? We would have to know the details about the file format in advance. This is made even more unpredictable by the fact that there's no way to know what engine is installed/running on the end user's device.
Is there really no standard for this?