107

I have a Lambda function that’s triggered by a PUT to an S3 bucket.

I want to limit this Lambda function so that it’s only running one instance at a time – I don’t want two instances running concurrently.

I’ve had a look through the Lambda configuration and docs, but I can’t see anything obvious. I can about writing my own locking system, but it would be nice if this was already a solved problem.

How can I limit the number of concurrent invocations of a Lambda?

Mark B
  • 183,023
  • 24
  • 297
  • 295
alexwlchan
  • 5,699
  • 7
  • 38
  • 49
  • Interested in why you care how many invocations run concurrently. – jarmod Feb 03 '17 at 18:03
  • @jarmod This was at a time when I was thinking of running Terraform changes in Lambdas, and I didn’t know how to do remote state locking in Terraform itself. I dropped this idea – in part because you can’t limit concurrent Lambdas, and in part because I was worried about the five-minute timeouts. – alexwlchan Aug 10 '17 at 06:28

5 Answers5

121

AWS Lambda now supports concurrency limits on individual functions: https://aws.amazon.com/about-aws/whats-new/2017/11/set-concurrency-limits-on-individual-aws-lambda-functions/

enter image description here

Robert Chen
  • 5,179
  • 3
  • 34
  • 21
  • 2
    This is awesome. I was able to solve a concurrency problem when updating a file on S3. I am using lambda function to update a file on S3 using concurrency count as 1. This ensures that at a time only one ec2 instance has write access to the file.This is much cheaper then using any managed DB on AWS for trivial usage. – rahul gupta Dec 17 '17 at 15:09
  • 2
    I know this is 2 years old, but this solved an issue I was having. Thank you! – Randomhero Nov 01 '19 at 10:38
  • 2
    Is this really 100% save? I did some research and if I get it correctly you might end up with a lot of valid message failing due to throttling in case you receive a lot of messages. As described in this detailed article: https://data.solita.fi/lessons-learned-from-combining-sqs-and-lambda-in-a-data-project/ How di you configure your Visibilty Timeout etc? – Björn Grambow Jun 12 '20 at 10:00
  • @BjörnGrambow i think it depends how you invoke the function. Ideal use case for this would be async invocation or event invocation from SQS. The only issue you will run into in this use case is if your events grow quicker than you can process them. – GWed Aug 28 '20 at 09:54
  • 9
    This feature is strange. My use case is just wanting to limit the execution of a particular scheduled function to not more than one instance (upper bound). But it appears that enabling the concurrency limit comes with the downside of preventing any of your other Lambdas from using the reserved portion. So if you have 100 Lambdas that run infrequently but with a concurrency limit of 10 each, all of your account's Lambda capacity is taken whether they are actually running or not. I'll probably do something else like let the additional executions start but check for an external lock, etc. – Taylor D. Edmiston Sep 25 '21 at 15:55
  • Is this actually guaranteed race free? Using quotas as a threading construct seems unreliable.I think it's good for limiting load, I would be less inclined to trust it as [mutex](https://en.wikipedia.org/wiki/Mutual_exclusion) – Att Righ Oct 27 '21 at 17:00
  • 1
    I don't think this really sets an upper bound limit. This just reserves x amount of lambdas to be available if a burst of requests occur. – TemporaryFix Oct 13 '22 at 19:10
  • 2
    Yes, we also fell into this thinking it'll just limit the function's max concurrency to 1, but it seems it deallocates it from the pool the other Lambdas in the account use. They need to add a "max concurrency" setting. – Ben Oct 14 '22 at 19:56
  • 2
    @TemporaryFix "Reserve Concurrency" **does** set the upper bound limit AND it removes the concurrency "units" from the account's "public" pool. Validated the behavior via testing. – DuckieHo Oct 28 '22 at 04:44
  • @DuckieHo You are correct, I was confused with "Provisioned concurrency". This page explains the difference between the 2 https://docs.aws.amazon.com/lambda/latest/dg/provisioned-concurrency.html – TemporaryFix Oct 28 '22 at 15:45
29

I would suggest you to use Kinesis Streams (or alternatively DynamoDB + DynamoDB Streams, which essentially have the same behavior).

You can see Kinesis Streams as as queue. The good part is that you can use a Kinesis Stream as a Trigger to you Lambda function. So anything that gets inserted into this queue will automatically be passed over to your function, in order. So you will be able to process those S3 events one by one, one Lambda execution after the other (one instance at a time).

In order to do that, you'll need to create a Lambda function with the simple purpose of getting S3 Events and putting them into a Kinesis Stream. Then you'll configure that Kinesis Stream as your Lambda Trigger.

Event Flow

When you configure the Kinesis Stream as your Lambda Trigger I suggest you to use the following configuration:

  • Batch size: 1
    • This means that your Lambda will be called with only one event from Kinesis. You can select a higher number and you'll get a list of events of that size (for example, if you want to process the last 10 events in one Lambda execution instead of 10 consecutive Lambda executions).
  • Starting position: Trim horizon
    • This means it'll behave as a queue (FIFO)

A bit more info on AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda.

I hope this helps anyone with a similar problem.

P.S. Bear in mind that Kinesis Streams have their own pricing. Using DynamoDB + DynamoDB Streams might be cheaper (or even free due to the non-expiring Free Tier of DynamoDB).

dsaiztc
  • 415
  • 4
  • 7
  • This answer is great! OP should really accept it as the solution. – GavinoGrifoni Jun 20 '17 at 13:03
  • 4
    To ensure exact behaviour you need one more configuration: **Kinesis Shard Count: 1** In a multi-shard _Kinesis Stream_ one _Lambda_ is triggered per stream, so we can have more than one _Lambda_ executing in parallel. – Raza Mar 21 '18 at 21:02
14

No, this is one of the things I'd really like to see Lambda support, but currently it does not. One of the problems is that if there were a lot of S3 PUT operations happening AWS would have to queue up all the Lambda invocations somehow, and there is currently no support for that.

If you built a locking mechanism into your Lambda function, what would you do with the requests you don't process due to a lock? Would you just throw those S3 notifications away?

The solution most people recommend is to have S3 send the notifications to an SQS queue, and then have your Lambda function scheduled to run periodically, like once a minute, and check if there is an item in the queue that needs to be processed.

Alternatively, have S3 send the notifications to SQS and just have a t2.nano EC2 instance with a single-threaded service polling the queue.

Mark B
  • 183,023
  • 24
  • 297
  • 295
5

I know this is an old thread, but I ran across it trying to figure out how to make sure my time sequenced SQS messages were processed in order coming out of a FIFO queue and not getting processed simultaneously/out-of-order via multiple Lambda threads running.

Per the documentation:

For FIFO queues, Lambda sends messages to your function in the order that it receives them. When you send a message to a FIFO queue, you specify a message group ID. Amazon SQS ensures that messages in the same group are delivered to Lambda in order. Lambda sorts the messages into groups and sends only one batch at a time for a group. If your function returns an error, the function attempts all retries on the affected messages before Lambda receives additional messages from the same group.

Your function can scale in concurrency to the number of active message groups.

Link: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html

So essentially, as long as you use a FIFO queue and submit your messages that need to stay in sequence with the same MessageGroupID, SQS/Lambda automatically handles the sequencing without any additional settings necessary.

N. Walker
  • 67
  • 1
  • 4
1

Have the S3 "Put events" cause a message to be placed on the queue (instead of involving a lambda function). The message should contain a reference to the S3 object. Then SCHEDULE a lambda to "SHORT POLL the entire queue".

PS: S3 events can not trigger a Kinesis Stream... only SQS, SMS, Lambda (see http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html#supported-notification-destinations). Kinesis Stream are expensive and used for real-time event handling.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
William Choy
  • 1,291
  • 9
  • 7