I’m currently working on a live translation web app allowing multiple participants to use the Azure Speech Translation and share their transcriptions in multiple languages.
I don’t want to be billed for the number of participants X the duration of a meeting. Hence the question: How can I activate the recognition only when speech is detected? This way, I would only pay for the people currently speaking.
I tried to use the speechStartDetected
event from the TranslationRecognizer class, but this event seems to fire only when the recognizer is currently recognizing (with recognizeOnceAsync()
or startContinuousRecognitionAsync()
)
Is there any parameter within the Speech SDK I can use to achieve what I want? If not, what are my options?
It might be possible to watch the audio dB level and activate the continuous recognition accordingly, but I think I will run into some problems If I try to do it this way. Ex: Once the audio level reach a certain level for a certain duration, this would trigger the startContinuousRecognitionAsync()
, but it would miss the beginning of the speech…
Thanks in advance!