1

I have an

  • SQS queue with a batch size of 1
  • Lambda function with a reserved concurrency of one
  • Lambda is configured to listen on the queue and do some work when messages arrive. Said work typically takes around 8-12 seconds, never more than 15.
  • Emphasis on "listening", so I'm not polling with a Lambda, but instead the Lambda gets triggered by the SQS queue automatically.

I pump 10 messages into the queue and expect the lambda to go through them one by one in reasonable (linear) time given that the reserved concurrency is exactly 1 around 150 seconds maximum in total.

What happens is that the first few messages get processed in reasonable time and then some extreme slowness kicks in (15 minutes of no visible progress). The SQS queue claims that all 10 messages are "in flight", basically putting the blame on the Lambda or the code running inside of it.

Has anybody experienced behavior like this and if yes did they ever figure out why?

Edit The same issue presents itself under different tests as well. For example when the reserved concurrency was 70, and then 700 messages were pumped into the queue (all at once, e.g. at machine speed) and the 698th message took around 10-15 minutes to get a Lambda to process it. I verified through logging that the execution of the code inside the Lambda does not take 10-15 minutes (just the usual 8-12 seconds) so everything seems to be pointing towards Lambda functions not getting allocated as they should, but I have no way of proving/disproving this at present.

Edit 2 Cloud Watch Metrics of SQS/Lambda/AWS RDS

Peter
  • 111
  • 4
  • 1
    It looks like at the time it stops processing the *Approx number of Visible/Not-visible messages in the queue* drops to 0 in the next 5-min reading. What's puzzling is that at the same time the *Approx age of the oldest message* goes up, despite according to the other metrics there are no messages in the queue. Maybe enable detailed monitoring to get a better time resolution, both on Lambda and on SQS. – MLu Oct 30 '18 at 22:57
  • 1
    Is reserved concurrency of 1 even valid with SQS integration? [*"If you configure reserved concurrency on your function, set a minimum of 5 concurrent executions to reduce the chance of throttling errors when Lambda invokes your function."*](https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#events-sqs-eventsource) – Michael - sqlbot Oct 31 '18 at 00:33
  • I guess that would be a good explanation for one of the cases, but still doesn't explain why the same thing happens with a reserved concurrency of 70 for example. Nevertheless, I will look into that deeper as well, thank you very much for your help! – Peter Oct 31 '18 at 01:10

1 Answers1

1

Is your Lambda triggered by SQS or is Lambda running all the time and polls SQS?

  • Triggering is generally better - your Lambda will be called as soon as something arrives to the queue with the event being the message. One Lambda call per message.

  • If you use Polling you must make sure that Lambda is re-scheduled when it exits. How do you do that? Is that 15 minutes of inactivity related to the scheduling?

MLu
  • 24,849
  • 5
  • 59
  • 86
  • I've updated my question with the answer: It is configured to be triggered, there is no polling. I agree that triggering is very much preferable in general. – Peter Oct 30 '18 at 21:56
  • 1
    @PeterMetz Post the CloudWatch metrics both for the queue and for the Lambda please. It may give some interesting insights too. – MLu Oct 30 '18 at 22:07
  • see edit #2 with the screenshot. It contains a larger sample from various tests (same test just with different number of concurrent lambdas ranging from 1 to 100) – Peter Oct 30 '18 at 22:40