1

I am using azure speech to text service using python to process bunch of audios. In order to process the audios, These are the steps performed-

  1. Download audio from web server to local 'C:/audio' drive.
  2. Pass the path of downloaded audio to Speech SDK's - Audioconfig(filename ='C:/audio/my_audio.wav')

Rather than downloading to local machine, I want to get the file from server and pass it directly to speech to text service. For which,

  1. I stored the audio in bytes form in audio buffer like this- raw_audio = my_audio_in_bytes # class <'bytes'>

  2. Then, I pass the audiobuffer to AudioConfig(filename = raw_audio) - It doesn't works. Because it expects a filepath

Is there a way to pass audiobuffer to this service?

Configuration python code:

speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
audio_config = speechsdk.audio.AudioConfig(filename='C:/audios/audio1.wav')
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)  
user1990
  • 37
  • 4
  • Check the documentation. The AudioConfig function accepts a `stream` parameter, although I cannot tell you what format it wants. – Tim Roberts Mar 03 '21 at 19:56
  • Tested with stream as well, the push/pull streams take the raw data but the final transcripts are messy with lot of redundant words. Hence, tried this approach. – user1990 Mar 03 '21 at 20:01

1 Answers1

0

@user1990, per our discussion on this GitHub issue, please use batch transcription, as Speech SDK does not directly support recognizing from a WAV file hosted on a web service (you will first need to download it locally).

Darren Cohen
  • 126
  • 6