2

What is the fastest expected response time of the Google Speech API with streaming audio data? I am sending an audio stream to the API and am receiving the interim results with a 2000ms delay, of which I was hoping I could drop to below 1000ms. I have tested different sampling rates and different voice models.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
Harry Stuart
  • 1,781
  • 2
  • 24
  • 39

3 Answers3

1

I'm afraid that response time can't be measured or guaranteed because of the nature of the service. We don't know what is done under the hood, in fact there is no SLA for response time even though there is SLA for availability.

Something that can help you is working on building a good request:

  1. Reducing 100-miliseconds frame size, for example, could ensure a good tradeoff between latency and efficiency.
  2. Following Best Practices will help you to make a clean request so that the latency can be reduced.

You may want to check following links on specific uses cases to know how they addressed latency issues:

rsantiago
  • 2,054
  • 8
  • 17
0

If you really care about response time you'd better use Kaldi-based service on your own infrastructure. Something like https://github.com/alumae/kaldi-gstreamer-server together with https://github.com/Kaljurand/dictate.js

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • How significantly faster is this and why? Could it respond in under 1000ms? What response times are reported? I tested their service out on https://bark.phon.ioc.ee/dictate/ as per the link on their github page. It seemed to be slower than that of Googles Speech API. I would like to be fairly certain that using this would be a significant improvement as I anticipate that setting it up entails a good amount of time. – Harry Stuart Sep 17 '18 at 14:36
  • It will be faster because you run it on your own server and can control the load. If load is excessive you just scale the workers. And it will be by order cheaper than Google. – Nikolay Shmyrev Sep 17 '18 at 18:35
0

Google Cloud Speech itself works pretty fast, you can check how quick your microphone gets transcribed https://cloud.google.com/speech-to-text/.

You may probably experience buffering issue on your side, the tool you are using may buffer data before sending(buffer flush) to underlying device(stream).

You can find out how to decrease output buffer of that tool to lower values e.g. 2Kb, so data will reach Node app and Google service faster. Google recommends to send data that equals to 100ms buffer size.

Maksim Shamihulau
  • 1,219
  • 1
  • 15
  • 17