azure speect to text service - what is continuous recognition doing with the audio on file

Question

We are comparing two speech to text services to present pros/cons of each service - with one service we upload a file and check status via a get request - downloading scripts when status returned is done. This allows us to 'fire and forget', frees local resources and we can re-allocate resources when it suits.

We have set up an azure continuous recognition process but are not sure what is going on under the hood. It seems we have to keep a constant connection open while the asr is processing then when it receives some signal of completion (input exhausted) the connection is closed. Not sure if file is uploaded in chunks of data or a continuous stream of data or uploaded in its entirety. Can this ever be fire and forget?

If someone can shed some light on the process or even point to the documentation where more in depth info is available, I'd be much obliged.

score 0 · Answer 1 · answered May 19 '22 at 11:56

When the speech to text conversion is going on and the resource need to be available based on the input type. As mentioned in the question, if the data is uploaded, then it can distribute into the chunks with fixed size. There are also chances to getting continuous input.

Upload: Upload the file, then based on the input file size, we can separate the file into chunks and it is an alternative operation.

Continuous Stream: Such type of data, we cannot let the resource to rest and keep on active.

It is recommended to utilize REST API services mentioned in the document, for better resource allocation.

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-speech-to-text

azure speect to text service - what is continuous recognition doing with the audio on file

1 Answers1