How to get entire transcript using google.cloud.speech_v1p1beta1?

Question

Using Google-Speech-to-Text, I only get partial transcription. Input file: from google sample audio file

Link to google repo location commercial_mono.wav

Here is my code:

def transcribe_gcs(gcs_uri):
from google.cloud import speech_v1p1beta1 as speech
from google.cloud.speech_v1p1beta1 import enums
from google.cloud.speech_v1p1beta1 import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri = gcs_uri)
config = speech.types.RecognitionConfig( language_code = 'en-US',enable_speaker_diarization=True, diarization_speaker_count=2)
operation = client.long_running_recognize(config, audio)


print('Waiting for operation to complete...')
response = operation.result(timeout=5000)
result = response.results[-1]

words_info = result.alternatives[0].words

tag=1
speaker=" "

for word_info in words_info:
    if word_info.speaker_tag==tag:
        speaker=speaker+" "+word_info.word

    else:
        print("speaker {}: {}".format(tag,speaker))
        tag=word_info.speaker_tag
        speaker=" "+word_info.word

Here is how I call the script:

transcribe_gcs('gs://mybucket0000t/commercial_mono.wav')

I only get partial transcription from the entire audio file

(venv3) ➜  g-transcribe git:(master) ✗ python gtranscribeWithDiarization.py
Waiting for operation to complete...
speaker 1:   I'm here
speaker 2:  hi I'd like to buy a Chrome Cast and I was wondering whether you 
could help me

That's all I get

If I execute the code multiple times, after 5 or 6 times, I don't receive any transcription.

Here is the result after a few tries:

(venv3) ➜  g-transcribe git:(master) ✗ python gtranscribeWithDiarization.py

Waiting for operation to complete...
speaker 1:  

(venv3) ➜  g-transcribe git:(master) ✗

Env: Using python3

Using google service account and no issues with connectivity.
Also copied the file to google storage and confirmed I can play
Tried converting file from wav to flac but results are same
used ffprobe to make sure there is only one channel

I am trying to get the entire transcription with time stamp when the speakers change.

Desired output

Speaker 1: Start Time 0.0001: Hello transcription starts
Speaker 2: Start Time 0.0009: Here starts with the transcription of the 2nd speaker and so on to the end of file.

Hope you can assist.

MarketerInCoderClothes · Answer 1 · 2019-02-25T13:35:02.307

0

Haven't had any issues with v1p1beta, yet, on my end.

Suggestion #1: Maybe an obvious suggestion, but does your project allow "data logging"? It's required for using more advanced features/models. Maybe try that? You can turn it off after testing, if it's not changing your outcome.

data-logging reference: https://cloud.google.com/speech-to-text/docs/data-logging

Suggestion #2: try using this line below:

client = speech_v1p1beta1.SpeechClient()

Suggestion #3: try adding the sample rate in your config

sample_rate_hertz = 44100

edited Feb 25 '19 at 13:35

answered Feb 25 '19 at 13:30

MarketerInCoderClothes

1,126
8
12

Did you try the code with the link in my question? and yes I have data-logging enabled already. – Stryker Feb 25 '19 at 17:45
No I didn't, I just have it working for my own files. Did you try Suggestion #2 by any chance? – MarketerInCoderClothes Feb 25 '19 at 21:44
Yes I did, but if you look closly, it is the same as what I have done on the "from google.cloud import speech_v1p1beta1 as speech" but I did try. Are you running a long_running_recognize? How big is your audio file? 30 or 60 min long? – Stryker Feb 25 '19 at 21:46
Yes, long running, 15 mintes to 35. – MarketerInCoderClothes Feb 25 '19 at 22:30

How to get entire transcript using google.cloud.speech_v1p1beta1?

1 Answers1