0

I have an AWS Lambda function (A) that returns n urls. I'd like to pass each of those urls individually and concurrently as parameters into another AWS Lambda function (B). Function B then processes the passed url and returns the result. Both of these functions are written in Python and I'd prefer to avoid other languages if possible. Does anyone have a definitive solution that accounts for timeouts, concurrency violations, other edge cases and/or errors?

Even with the maximum memory allocated, function A takes ~85 seconds just to set the payload and invoke function B 1,100 times. Is ~80ms typical to invoke another AWS Lambda function? Is there a faster way? Additionally, the CloudWatch Logs for function B separate the invocations among multiple log streams making it hard to see all the invocations in 1 place to confirm if things were done properly and/or in what order and/or where any errors/delays may be located.

I've looked at the boto3.client('lambda') docs

I also leveraged Using boto to invoke lambda functions how do I do so asynchronously? and AWS Lambda: call function from another AWS lambda using boto3 invoke to get to my existing code.

This is the code I'm using for testing.

# Function A - using max Memory setting (3008 MB currently) to speed things up

import boto3
import json

def lambda_handler(event, context):
    #simulate 1,100 urls (more than the default concurrency limit of 1,000)
    n = 1100
    results = range(1, n+1)
    #invoke function B asynchronously
    for result in results:
        payload = {'url' : result}
        boto3.client('lambda').invoke(FunctionName='B', InvocationType='Event', Payload=json.dumps(payload))
    return{'statusCode': 200, 'body': json.dumps('Hello from Lambda!')}
# Function B - using the min Memory setting (128 MB currently)

import json
import time

def lambda_handler(event, context):
    #wait 5 seconds to simulate processing time
    time.sleep(5)
    #process passed payload from function A
    print(event['url'])
    return{'statusCode': 200, 'body': json.dumps('Bye from Lambda!')}


grove80904
  • 419
  • 2
  • 5
  • 14

1 Answers1

3

Is ~80ms typical to invoke another AWS Lambda function?

That doesn't sound very bad to me, but there might be some room for improvement. The one thing that jumps out at me when looking at your code is that you are creating the AWS Lambda client object over and over. Try creating the client once, like this:

client = boto3.client('lambda')
for result in results:
        payload = {'url' : result}
        client.invoke(FunctionName='B', InvocationType='Event', Payload=json.dumps(payload))

By reusing the same client object I think you will see a performance improvement due to the reuse of the underlying HTTP connection with the AWS API server.

Additionally, the CloudWatch Logs for function B separate the invocations among multiple log streams making it hard to see all the invocations in 1 place to confirm if things were done properly and/or in what order and/or where any errors/delays may be located.

You're dealing with more than a thousand asynchronous processes running on multiple servers. Viewing all those logs in one place is going to be a challenge. You might look into using something like CloudWatch Logs Insights.

Does anyone have a definitive solution that accounts for timeouts, concurrency violations, other edge cases and/or errors?

A typical pattern for managing timeouts, concurrency limits and other errors would be to send all the events to an SQS queue and let the queue trigger your second Lambda function. However, while your first Lambda function will complete just as fast as it is now, or possibly faster,

Another pattern that could be used to solve some of these issues would be to implement an exponential backoff algorithm in your first Lambda function. However that would require your function's code to handle retries directly instead leaning on other AWS services like SQS to handle the retries for you, and it would require adding pauses in your Lambda function, which could potentially cause the first function invocation to eventually timeout before it has successfully triggered all of the second function invocations, which just creates another error condition you would have to handle somehow.

Mark B
  • 183,023
  • 24
  • 297
  • 295