Google Speech-to-Text API: missing or poor confidence for speech recognition

Question

I am using the Google API for speech recognition.

I am using 2.5 sec audio samples. Below, you can see an example of output where the confidence is omitted:

{u'alternative': [{u'transcript': u'if Carol comes tomorrow have a'}, {u'transcript': u'if Carroll comes tomorrow never'}, {u'transcript': u'if Carroll comes tomorrow have a'}, {u'transcript': u'if Carole comes tomorrow have a'}, {u'transcript': u'if care comes tomorrow have a'}, {u'transcript': u'if Carroll comes tomorrow however'}, {u'transcript': u'if girl comes tomorrow have a'}, {u'transcript': u'is Carroll comes tomorrow have a'}, {u'transcript': u'if call comes tomorrow have a'}, {u'transcript': u'Carol comes tomorrow have a'}, {u'transcript': u'if kevin comes tomorrow have a'}, {u'transcript': u'if Carroll comes tomorrow have'}, {u'transcript': u'if korea comes tomorrow have a'}, {u'transcript': u'if Carroll come tomorrow have a'}, {u'transcript': u'if cry comes tomorrow have a'}], u'final': True}

The original sample is partially cut at the end, but definitely says: "if Carol comes tomorrow have a..."

In 95% of the cases, I get the confidence value only for the very first sentence, all the alternatives are omitted:

{u'alternative': [{u'confidence': 0.91297865, u'transcript': u'by that time perhaps something better can'}, {u'transcript': u'by that time perhaps something better came'}, {u'transcript': u'by that time perhaps something better Kim'}, {u'transcript': u'but that time perhaps something better can'}, {u'transcript': u'by that time perhaps something better come'}], u'final': True}

Here the sentence is: "By that time perhaps something better can be". So the first transcription is pretty much accurate.

Just in case, this is how I run the evaluation in Python:

import speech_recognition as sr
from scipy.io import wavfile

r = sr.Recognizer()
with sr.WavFile(target0_path) as source:
    audio = r.record(source)
list = r.recognize_google(audio, None, "en-US", True)

Do you have any idea or advice? Any particular settings I could use to avoid the problem?

Do not use speech_recognition python module, use google streaming directly. — Nikolay Shmyrev, Apr 13 '19 at 20:56

Google Speech-to-Text API: missing or poor confidence for speech recognition

0 Answers0