0

This is something of a two part question.

I'm writing a unity application using google speech to text streaming and it was noticed that if a user continues talking for about a minute without pausing, google sends IsFinal even if they haven't taken a break yet.

It's unlikely that a user would speak for 1 minute straight but if they do we had hoped to capture that in the single response. Is this intentional? I've looked around but not too sure.

Also... For when the user does take a break, can we increase the amount of seconds google waits before sending IsFinal to something around 3 seconds?

That way we can increase the window we wait for before sending a response to the user just incase they are not done yet.

Understandably, this might clash with the 1 minute limit.

The code I'm using to handle the the streaming library: https://github.com/oshoham/UnityGoogleStreamingSpeechToText/blob/master/Runtime/StreamingRecognizer.cs

SSal
  • 1
  • Does [this](https://cloud.google.com/speech-to-text/docs/async-recognize) help? "*Asynchronous speech recognition* starts a long running audio processing operation. Use asynchronous speech recognition to recognize audio that is longer than a minute." – Philipp Lenssen Mar 19 '20 at 09:11
  • Thanks for responding, unfortunately it wouldn't help, this solution would mean the audio has to be recorded into a file first then sent for transcription. This would take a little longer to receive the results than if audio is streamed as the user speaks and it would be harder to know when the user has stopped speaking in order to end recording. – SSal Mar 19 '20 at 12:16

0 Answers0