Movie py : importing audio from text-to-speech in memory

Question

I'm trying to use text-to-speech from Azure in combination with movie.py to create the audio stream for a video.

result = synthesizer.speak_ssml_async(xml_string).get()
stream = AudioDataStream(result)

The output of this process is:

<azure.cognitiveservices.speech.AudioDataStream at 0x2320cb87ac0>

However, movie.py is not able to import this with the following command:

audioClip = AudioFileClip(stream)

This is giving me the error:

AudioDataStream' object has no attribute 'endswith'

Do I need to convert the Azure Stream to .wav? How do I do that? I need to do the entire process without writing .wav files locally (e.g. stream.save_to_wav_file) but just using the memory streams.

Can someone spot a light, please?

Based on the source code of AudioFileClip: https://zulko.github.io/moviepy/_modules/moviepy/audio/io/AudioFileClip.html, You must specify the file name, it not provides a way to init from a stream. You can write stream to a .wav file and remove it after use locally. — Stanley Gong, Feb 04 '21 at 02:17
@StanleyGong I found that there is a way with "from moviepy.audio.AudioClip import AudioArrayClip" and use AudioArrayClip instad of AudioFileClip, but it does seem to not like my audio stream. It tells me "'numpy.uint8' object is not iterable". I cannot save a .wav file locally because this is run with an azure function without storage, only memory — Giovanni Petrone, Feb 04 '21 at 08:13
Actually, Azure function has its own file system, and you can save temp files. I think this post could be helpful : https://stackoverflow.com/questions/63265669/is-possible-to-save-a-temporaly-file-in-a-azure-function-linux-consuption-plan-i — Stanley Gong, Feb 04 '21 at 08:26
@StanleyGong thank for your input. I am using dir_path = tempfile.gettempdir() in my azurefunction to get the temp file location. However when I try to do stream.save_to_wav_file(dir_path+ "temp.wav") the azure function fails — Giovanni Petrone, Feb 04 '21 at 23:23

score 0 · Accepted Answer · answered Feb 05 '21 at 03:17

I write a HTTP trigger Python function for you, just try the code below :

import azure.functions as func
import azure.cognitiveservices.speech as speechsdk
import tempfile
import imageio
imageio.plugins.ffmpeg.download()
from moviepy.editor import AudioFileClip



speech_key="<speech service key>"
service_region="<speech service region>"
temp_file_path = tempfile.gettempdir() + "/result.wav"
text = 'hello, this is a test'

def main(req: func.HttpRequest) -> func.HttpResponse:
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

    auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig()

    speech_synthesizer = speechsdk.SpeechSynthesizer(
        speech_config=speech_config, auto_detect_source_language_config=auto_detect_source_language_config,audio_config=None)

    result = speech_synthesizer.speak_text_async(text).get();
    if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
            stream = speechsdk.AudioDataStream(result)
            stream.save_to_wav_file(temp_file_path)
    
    myclip = AudioFileClip(temp_file_path)

    return func.HttpResponse(str(myclip.duration))

The logic is simple getting a speech stream from speech service and save to a temp file and use AudioDataStream to get its duration.

Result:

If you still get some errors, you can get error details here:

Let me know if you have any further questions.

thanks for your inputs I managed to use the azure functions filesystem to do the trick. I incurred in the issue that I mentioned in my reply. Python Error Code 137, I wonder if it's because my azure function gets killed by some processes, it is happening a the very end during the rendering of the video. — Giovanni Petrone, Feb 05 '21 at 23:50

Movie py : importing audio from text-to-speech in memory

1 Answers1

Linked