0

The Google Speech to Text documentation recommends using a 100 ms frame size to minimize latency.

Any frame size is acceptable. Larger frames are more efficient, but add latency. A 100-millisecond frame size is recommended as a good tradeoff between latency and efficiency.
-Best Practices

However, what is frame size I do not know. Is the frame size the same as the AudioBuffer.length?

AudioBuffer.length

1 Answers1

0

The frames are chunks of StreamingRecognizeRequest messages that can contain one of the two fields: streaming_config and audio_content. The first StreamingRecognizeRequestmessage will ship only the streaming_config, after that all the subsequent messages will ship audio_content.

You can find more details in this and this documentations.

dhauptman
  • 974
  • 7
  • 14