Is there a limit to the amount of text which can be submitted to the TTS (neural) Speech Service endpoints?
All of the requests I'm making from an Azure Function are successful but have a cutoff at 10 minutes exactly.
Is there a limit to the amount of text which can be submitted to the TTS (neural) Speech Service endpoints?
All of the requests I'm making from an Azure Function are successful but have a cutoff at 10 minutes exactly.
Yes, it is stated in the old Bing Speech API documentation that the Speech Service places limitations on the duration of the WebSocket connections to the service with a maximum duration of 10 minutes for active WebSocket connection and a maximum of 180 seconds for inactive.
UPDATE
It is also stated in the new Speech Service documentation that an access token is valid for 10 minutes.
If you are using javascript from the docs
JvaScript service wrapper for Microsoft Speech API. It is an implementation of the Speech Websocket API specifically, which supports long speech recognition up to 10 minutes in length.
TTS documentation says: Asynchronous synthesis of long audio: Use the batch synthesis API (Preview) to asynchronously synthesize text-to-speech files longer than 10 minutes
.
Batch synthesis API documentation says: The Batch synthesis API ... can synthesize a large volume of text input (long and short) asynchronously... create synthesized audio longer than 10 minutes
.
So I believe it implies that the synchronous TTS API can handle only up to 10 minutes of audio. In my case, TTSing long text gave me HTTP status code 200 with the response being send via chunked transfer encoding, and after like 10s it failed on System.Net.Http.HttpRequestException: Error while copying content to a stream. ---> System.IO.IOException: The response ended prematurely.
, so I think the TTS backend was generating the audio from the text, and once the audio became longer than 10mins, it threw an exception and closed the connection.