The Google Speech to Text documentation recommends using a 100 ms frame size to minimize latency.
Any frame size is acceptable. Larger frames are more efficient, but add latency. A 100-millisecond frame size is recommended as a good tradeoff between latency and efficiency.
-Best Practices
However, what is frame size I do not know. Is the frame size the same as the AudioBuffer.length?