google speech to text not working correctly with very short audio (single words)

Question

I'm testing google Speech-to-Text API with streaming audio as well as with wav files. I'm using audio from telephony: 8000 sample rate, 8bits, mulaw encoding. The Google configuration is set appropriately.

When I test it with normal sequences, it returns a correct transcription. However when I say a single word (especially a number), I'm very often obtaining no response from the api -> as if it would be no input. This occurrence happens for both streaming as well as batch transcription.

does anybody know why is this happening? how to fix it?

Is it possible for you to transcode to LINEAR16 or FlAC encoding ? The [best practices](https://cloud.google.com/speech-to-text/docs/best-practices) of Cloud Speech-to-Text API suggest the same. — Krish, Dec 29 '21 at 12:26
Hi, consider accepting/upvoting the answer if you find it helpful. — Krish, Jan 11 '22 at 10:04

Krish · Accepted Answer · 2022-01-11T06:00:20.877

1

The Cloud Speech-to-Text API best practices suggest using a lossless codec like FLAC or LINEAR16. I verified with LINEAR16 and it works for single words which are digits. So the solution would be to transcode the audio.

edited Jan 11 '22 at 06:00

answered Jan 11 '22 at 05:54

Krish

752
3
10

google speech to text not working correctly with very short audio (single words)

1 Answers1