1

I want a .webm file to be converted to a .wav file after it hits my S3 bucket. I followed this tutorial and tried to adapt it from my use case using the .webm -> .wav ffmpeg command described here.

My AWS Lambda function generally works, in that when my .webm file hits the source bucket, it is converted to .wav and ends up in the destination bucket. However, the resulting file .wav is always 0 bytes (though the .webm not, including the appropriate audio). Did I adapt the code wrong? I only changed the ffmpeg_cmd line from the first link.

import json
import os
import subprocess
import shlex
import boto3

S3_DESTINATION_BUCKET = "hmtm-out"
SIGNED_URL_TIMEOUT = 60

def lambda_handler(event, context):

    s3_source_bucket = event['Records'][0]['s3']['bucket']['name']
    s3_source_key = event['Records'][0]['s3']['object']['key']

    s3_source_basename = os.path.splitext(os.path.basename(s3_source_key))[0]
    s3_destination_filename = s3_source_basename + ".wav"

    s3_client = boto3.client('s3')
    s3_source_signed_url = s3_client.generate_presigned_url('get_object',
        Params={'Bucket': s3_source_bucket, 'Key': s3_source_key},
        ExpiresIn=SIGNED_URL_TIMEOUT)
    
    ffmpeg_cmd = "/opt/bin/ffmpeg -i \"" + s3_source_signed_url + "\" -c:a pcm_f32le " + s3_destination_filename + " -"
    
    
    command1 = shlex.split(ffmpeg_cmd)
    p1 = subprocess.run(command1, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    resp = s3_client.put_object(Body=p1.stdout, Bucket=S3_DESTINATION_BUCKET, Key=s3_destination_filename)

    return {
        'statusCode': 200,
        'body': json.dumps('Processing complete successfully')
    }
 
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
eartoolbox
  • 327
  • 2
  • 10
  • Never sure which statements actually wait for subprocesses / commands but did you make sure the `subprocess.run` waits for the command to finish before returning to p1? If that is not the case you upload an empty file the instant after trying to convert into that file but the upload is done before the conversion even started. – luk2302 Apr 10 '21 at 17:27
  • According to the description, the command waits: https://docs.python.org/3/library/subprocess.html – eartoolbox Apr 10 '21 at 17:49

2 Answers2

0

The code as presented uses the output of ffmpeg as the source data to upload. For that to work, ffmpeg needs to output data. Breaking the command down

"-i \"" + s3_source_signed_url + "\" " +   # The input filename to use
"-c:a pcm_f32le " +                        # The encoder to use
s3_destination_filename + " " +            # The output filename to write to
"-"                                        # Output data to stdout

In other words, you're telling ffmpeg to use two different outputs. This is not what you want. On top of that, if you remove the output filename so it only attempts to use stdout, it will not know which file format to use.

If you use a command like:

ffmpeg_cmd = "/opt/bin/ffmpeg -i \"" + s3_source_signed_url + "\" -c:a pcm_f32le -f wav -"

It should do what you're after. Here ffmpeg has been instructed to output as a wav file, and send the output to stdout only.

Anon Coward
  • 9,784
  • 3
  • 26
  • 37
  • Yes, this works! And thank you for the explanation. How would you recommend to run another process on the resulting .wav file? Would you setup another trigger from the destination bucket, or can you do something else with the .wav file after it has been processed? – eartoolbox Apr 10 '21 at 18:35
  • There are pros and cons to both approaches. Personally, I prefer keeping Lambdas as self-contained as possible, so add a process that triggers off the creation of the wav files that does some further work. – Anon Coward Apr 10 '21 at 18:42
  • I am experiencing a similar issue with my next trigger (i.e a separate lambda function which takes the .wav created the first). Here is the python line I am using to run the command: sonic_annotator_cmd = "/opt/bin/sonic-annotator -f -d vamp:pyin:pyin:notes -w csv " + s3_source_signed_url + " -" This results in an empty .csv but it should contain relevant data based on the .wav file. – eartoolbox Apr 11 '21 at 05:39
0

I tried the approach that outputs data to stdout but it took more than 15 minutes and it failed.

So I use EFS that can hold more than 512 MB files. Using Amazon EFS with Lambda - AWS Lambda

LittleWat
  • 53
  • 6