1

I'm testing google Speech-to-Text API with streaming audio as well as with wav files. I'm using audio from telephony: 8000 sample rate, 8bits, mulaw encoding. The Google configuration is set appropriately.

When I test it with normal sequences, it returns a correct transcription. However when I say a single word (especially a number), I'm very often obtaining no response from the api -> as if it would be no input. This occurrence happens for both streaming as well as batch transcription.

does anybody know why is this happening? how to fix it?

ylvi-bux
  • 37
  • 6
  • 1
    Is it possible for you to transcode to LINEAR16 or FlAC encoding ? The [best practices](https://cloud.google.com/speech-to-text/docs/best-practices) of Cloud Speech-to-Text API suggest the same. – Krish Dec 29 '21 at 12:26
  • Hi, consider accepting/upvoting the answer if you find it helpful. – Krish Jan 11 '22 at 10:04

1 Answers1

1

The Cloud Speech-to-Text API best practices suggest using a lossless codec like FLAC or LINEAR16. I verified with LINEAR16 and it works for single words which are digits. So the solution would be to transcode the audio.

Krish
  • 752
  • 3
  • 10