0

So I have a use-case where I want to upload audio files (.WAV) into a blob storage which triggers a Function and gets the text from the audio. At the moment, the only way possible is having the audio file locally. The audio config can't take the uri of the audio file. The code I'm using is this:

import azure.cognitiveservices.speech as speechsdk

speech_key, service_region = "sub-key", "westeurope"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
audio_input = speechsdk.AudioConfig(filename="**BLOB URI**")

speech_recognizer = speechsdk.SpeechRecognizer(speech_config, audio_input)

result = speech_recognizer.recognize_once()

if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

From my research, we can't have a uri as a filename (bold part of code). Solutions like downloading locally first won't work.

I tried reading the audio as a stream but I couldn't find a way to convert to an AudioInputStream.

Any help would be great. Thanks.

1 Answers1

0

You can use the Batch transcription REST API operations that enables you to transcribe a large amount of audio in storage. You can point to audio files using a typical URI or a shared access signature (SAS) URI and asynchronously receive transcription results. With the v3.0 API, you can transcribe one or more audio files, or process a whole storage container.

Please see the followings:

https://medium.com/@abhishekcskumar/logic-apps-large-audio-speech-to-text-batch-transcription-d71e93bbaeec

https://github.com/PanosPeriorellis/Speech_Service-BatchTranscriptionAPI/blob/master/CrisClient/Program.cs

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription#sample-code

Ram
  • 2,459
  • 1
  • 7
  • 14