7

To me this seemed like a simple use case when I started, but it turned out a lot harder than I had anticipated.

Problem

I have an AWS SQS acting as a job queue that triggers a worker AWS Lambda. However since the worker lambdas are sharing non-scalable resources it is important to limit the number of concurrent running lambdas to (for the sake of example) no more than 5 lambdas running simultaneously.

Simple enough, according to Managing Concurrency for a Lambda Function

Reserved concurrency also limits the maximum concurrency for the function, and applies to the function as a whole

However, setting the Reserved concurrency-property to 5 seems to be completely ignored by SQS, with the queue Messages in Flight-property in my case showing closer to 20-30 concurrent executions depending on the amount of messages put into the queue.

Question

The closest I have come to a solution is to use a SQS FIFO queue and setting the MessageGroupId to a value of either randomly selecting or alternating between 1-5. However, due to uneven workload this is not optimal as it would be better to have the concurrency distributed by actual workload rather than by chance.

I have also tried using the AWS Step Functions as the Map-state has a MaxConcurrency parameter, which seemed to work well on small job queues, but due to each state having an input/output limit of 32kb, this was not feasible in my use-case.

Has anyone found a better or alternative solution? Are there any other ways Reserved concurrency is supposed to be used?

Similar

Here are some similar questions I have found, but I think my question is different because I am not interested in limiting the total number of invocation, and (although I have not tried it myself) I can not see why triggers from S3 or Kinesis Steam would behave different from SQS.

Adelost
  • 2,343
  • 2
  • 23
  • 28
  • 2
    Are you positive that there are more than 5 lambdas running? Just because a message is in flight doesn’t mean it has its own dedicated lambda running. SQS doesn’t know how many lambdas are allowed to run, it sends its jobs out and lambda processes them however it processes them. SQS just makes sure they get processed. It could just be that each of your 5 lambdas is handling 4-6 jobs sequentially. Nothing about SQS should be able to override reserves concurrency. – bryan60 Apr 08 '20 at 22:43
  • 1
    To build on @bryan60’s comment, you can verify the number of concurrent executions with CloudWatch metrics and logs. If more messages are being processed than expected, it could also be the batch size? – hephalump Apr 08 '20 at 23:04
  • 1
    @bryan60, yes it turns out you and hephalump are right. Messages in Flight, does not seem to be an accurate representation, and at least according to the monitoring statistics of the lambda no more than 5 concurrent lambdas got executed. – Adelost Apr 09 '20 at 07:07
  • @bryan60, however, this raises some question how the visibility timeout is handled when the lambda is throttled. I created a follow-up question if anyone would be interested. https://stackoverflow.com/questions/61116499/avoiding-timeouts-when-aws-lambda-with-limited-concurrency-is-consuming-from-a-a – Adelost Apr 09 '20 at 08:03

2 Answers2

1

According to AWS docs AWS SQS doesn't take into account reserved concurrency. If number of batches to be processed is greater than reserved concurrency, your messages might end up in a dead-letter queue:

If your function returns an error, or can't be invoked because it's at maximum concurrency, processing might succeed with additional attempts. To give messages a better chance to be processed before sending them to the dead-letter queue, set the maxReceiveCount on the source queue's redrive policy to at least 5. https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html

You can check this article for details: https://zaccharles.medium.com/lambda-concurrency-limits-and-sqs-triggers-dont-mix-well-sometimes-eb23d90122e0

Andriy Kharchuk
  • 1,165
  • 1
  • 13
  • 25
0

This issue is resolved today Jan 2023. You can use maximum concurrency as suggested in this blog . I was using FIFO with groupid as my backend was non-scalable and i wanted to not have any throttling issue as having too many messages on DLQ does not help.

Critical Enhancement

https://aws.amazon.com/blogs/compute/introducing-maximum-concurrency-of-aws-lambda-functions-when-using-amazon-sqs-as-an-event-source/