I've made a python script that splits about hour-long mp3 into 5 minute chunks, then converts them to flacs and uploads to google storage bucket and I'm doing the Speech to text recognition, however it's pretty slow. Every 5 minute chunk takes about 2 minutes. It took about 25 minutes to do a 53 minute long audio file Shouldn't it be far faster? This part of code does the Speech to text thing:
for i in range (0,x+1):
client = speech.SpeechClient.from_service_account_json('credentials2.json')
storage_uri = 'gs://MYBUCKET/sound-%s.flac' % i
print (storage_uri)
with io.open('sound-%s.flac' % i, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
audio = {"uri": storage_uri}
enable_speaker_diarization = True
config = types.RecognitionConfig(
encoding = enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz = 48000,
language_code = 'pl-PL',
audio_channel_count=1)
operation = client.long_running_recognize(config,audio)
response = operation.result()
data = open("transkrypcja.txt","a")
for result in response.results:
alternative = result.alternatives[0]
data.write(format(alternative.transcript) + '\n')
data.write('\n\n\n\n\n')
data.close()
print('done')