10

I'd like to move towards serverless for audio transcoding routines in AWS. I've been trying to setup a Lambda function to do just that; execute a static FFmpeg binary and re-upload the resulting audio file. The static binary I'm using is here.

The Lambda function I'm using in Python looks like this:

import boto3

s3client = boto3.client('s3')
s3resource = boto3.client('s3')

import json
import subprocess 

from io import BytesIO

import os

os.system("cp -ra ./bin/ffmpeg /tmp/")
os.system("chmod -R 775 /tmp")

def lambda_handler(event, context):

    bucketname = event["Records"][0]["s3"]["bucket"]["name"]
    filename = event["Records"][0]["s3"]["object"]["key"]

    audioData = grabFromS3(bucketname, filename)

    with open('/tmp/' + filename, 'wb') as f:
        f.write(audioData.read())

    os.chdir('/tmp/')

    try:
        process = subprocess.check_output(['./ffmpeg -i /tmp/joe_and_bill.wav /tmp/joe_and_bill.aac'], shell=True, stderr=subprocess.STDOUT)
        pushToS3(bucketname, filename)
        return process.decode('utf-8')
    except subprocess.CalledProcessError as e:
        return e.output.decode('utf-8'), os.listdir()


def grabFromS3(bucket, file):

    obj = s3client.get_object(Bucket=bucket, Key=file)
    data = BytesIO(obj['Body'].read())

    return(data)

def pushToS3(bucket, file):

    s3client.upload_file('/tmp/' + file[:-4] + '.aac', bucket, file[:-4] + '.aac')

    return

You can listen to the output of this here. WARNING: Turn your volume down or your ears will bleed.

The original file can be heard here.

Does anyone have any idea what might be causing the encoding errors? It doesn't seem to be an issue with the file upload, since the md5 on the Lambda fs matches the MD5 of the uploaded file.

I've also tried building the static binary on an Amazon Linux instance in EC2, then zipping and porting it into the Lambda project, but the same issue persists.

I'm stumped! :(

jmkmay
  • 1,441
  • 11
  • 21

1 Answers1

22

Alright this is a fun one.

So it turns out the Python subprocess inherits stdin from some Lambda processes going on in the background. I was watching this AWS re:Invent keynote and he was describing some issues they were having w.r.t. this issue.

I added stdin=subprocess.DEVNULL to the subprocess call and the audio is now fixed.

Very interesting bug if you ask me.

jmkmay
  • 1,441
  • 11
  • 21
  • 1
    Good find! For what it's worth, if it were me, I'd probably have FFmpeg just output directly to S3. It has a built-in HTTP client and can do PUT. You can probably pre-sign a URL with your standard tools in Python, and just pass that output URL to FFmpeg. – Brad Aug 24 '18 at 21:44
  • Yea I've been looking at this approach since theres a limit for ephemeral storage in `/tmp`. An issue is that we have some custom transcoding routines that don't use ffmpeg, and we'd need to rethink those if we were to go for the streaming route. – jmkmay Aug 28 '18 at 18:37
  • You could always pipe to cURL, if those other transcoders can output to STDOUT. – Brad Aug 28 '18 at 18:39
  • Pipe to cURL? How would that work. cURL knows how to treat stdin as a series of PUT requests? – jmkmay Aug 29 '18 at 19:03
  • What's your output? If you're outputting something like segments, no, that isn't going to work. But for a single file? It's fine. – Brad Aug 29 '18 at 19:42
  • 4
    YOU ARE AWESOME! This is something that's been bothering me for hours! – derekhh Feb 18 '19 at 08:39
  • 1
    @derekhh glad to be of help ^^ – jmkmay Feb 19 '19 at 16:37
  • 1
    Amazing, I was about giving up to use lamdba altogether for this reason. Thank you very much! – amaurs Apr 12 '19 at 21:00
  • 1
    Amazing, the issue is still relevant. Thank you for help – VadimK Apr 03 '20 at 16:45
  • 2
    So unexpected so simple!!! Wasted 5 hours thinking it's caused by ffmpeg.. Thank you – Jithu R Jacob May 15 '20 at 21:37
  • This deserves more upvotes! Had this issue when working with PyDub. Converted mp3s to wavs before creating the AudioSegment, worked like a charm. Thank you! – vgro Jul 15 '20 at 14:36
  • I found similar behavior when transcoding some videos, each time it would randomly lose a portion of it, each time at a different place! I've been baffled about it for very long. I mean VERY long. As Jacob and others mentioned, I thought it's a ffmpeg problem and just doesn't understand how it could be so random. Thank you for revealing this to us a I sure hope this could be fixed sometime soon. (unlikely, if they already have this going for 2 years..) – Yunkai Xiao Aug 27 '20 at 21:03
  • Update: This solution worked, and still works on transcoding webm to mp3; while that weird behavior persisted transcoding mp4 to mp3. – Yunkai Xiao Jun 23 '21 at 19:06