0

I'm just trying to simply get a transcript from an audio file using python SpeechRecognition. It seems like no matter what pause_threshold I set, or duration or whatever, it always gives me the same exact output, approximately 30 seconds out of 80 seconds audio, and then it cuts off.

import speech_recognition as sr

import moviepy.editor as mp

clip = mp.VideoFileClip(r"recording2.webm")

clip.audio.write_audiofile(r"converted.wav")

r = sr.Recognizer()

r.pause_threshold = 10

# r.energy_threshold = 4000

audio = sr.AudioFile("converted.wav")

with audio as source:
   audio_file = r.record(source, duration=90)

result = r.recognize_azure(audio_file, key=AZUREKEY, language="en-US", show_all=False, location="westeurope")

print(result)

No matter how I set up, still has the same result.

  • I'm not sure but it can be restricted by Google servers which convert it. For longer audio it may need to register in Google API and use special methods to send longer audio. – furas Nov 09 '21 at 22:59
  • Google Doc: [Transcribe long audio files](https://cloud.google.com/speech-to-text/docs/async-recognize) – furas Nov 09 '21 at 23:02
  • I'm using Azure Speech Service. But it could be also restricted on their servers as well as Google's, I guess. – eeveepotato Nov 10 '21 at 08:03

1 Answers1

0

I'm not exactly sure if this is the correct way, but it's currently a sufficient way to deal with the issue. I split the audio into chunks of 30 seconds and build up the whole transcript.

with audio as source:
    r.adjust_for_ambient_noise(source)
    for chunk in range(no_of_chunks):
        audio_data = r.record(source, duration=30)
        transcript = r.recognize_azure(audio_data, key=AZURE_KEY, language="en-US", show_all=False,
                                       location="westeurope")
        result += transcript + " "