1

I have Lambda subscribed to SNS topic and having concurrency limit set to 5.

Once 20 SNS messages are pushed, Lambda will run 5 instances and process first 5 SNS messages, that is totally ok. According to docs, other messages will be considered throttled and sent to retry. Again seems like expected behavior.

At this stage I have CloudWatch logs for 5 lambda instances with 1 message processed by each one. Still ok.

Once retry time comes I see those 5 instances further processing another messages, however now each of 5 lambda instances has about 8 processed messages (should have about 4 to result in total 20 messages processed by 5 instances). Instances processed some retried messages twice, both time successfully and under different request ID's.

It seems like sometimes SNS messages can be delivered twice, but given numbers above, it looks like nearly every message is delivered and processed twice.

Can these somehow be avoided?

Lambda concurrency is limited to 5 just because if I push 500 messages it will fire 500 instances and bring my RDS down as lambda creates connection.

So question again - why are retried messages processed twice if first processing ended successfully. Also detecting duplicates is hard because they are processed by different instances each time

antiplayer
  • 323
  • 7
  • 15

1 Answers1

5

This is a known observation (whether it's an issue is debatable) https://forums.aws.amazon.com/thread.jspa?threadID=252415&tstart=0

I noticed this started happening around two months ago. The above support forum topic on AWS is not active enough to get AWS' attention unfortunately.

From SNS FAQs:

Q: How many times will a subscriber receive each message?

Although most of the time each message will be delivered to your application exactly once, the distributed nature of Amazon SNS and transient network conditions could result in occasional, duplicate messages at the subscriber end. Developers should design their applications such that processing a message more than once does not create any errors or inconsistencies.

In the end, AWS does NOT guarantee that there will never be duplicate so we have to design our applications around that.

In my application, I switched to writing into a DynamoDB table and use DynamoDB Stream to trigger Lambda instead of publishing into an SNS topic that a Lambda listens to.

Community
  • 1
  • 1
Noel Llevares
  • 15,018
  • 3
  • 57
  • 81
  • Can you give brief explanation on how you handle lambda concurrency limit for streaming event sources like DynamoDB? What happens when you add more records to DB then lambda instances available or in case of DynamoDB event as trigger you are limited to running 1 lambda instanse? – antiplayer Dec 15 '17 at 02:24
  • I haven't reached that point for me to worry about that. Sorry. – Noel Llevares Dec 15 '17 at 02:42
  • 1
    This is interesting, and troubling if true, because arguments for "at least once" delivery really don't justify a behavior pattern like this. I'll see if I can duplicate, and try to draw some official attention to this if I can document it happening. – Michael - sqlbot Dec 15 '17 at 04:13
  • 1
    @Michael-sqlbot Thanks. Please do. If you read that forum topic there are other people who has observed this as well and we all have the same symptoms -- the duplicate message comes exactly 10 minutes after the first one. – Noel Llevares Dec 15 '17 at 04:51
  • @dashmug the requestId that remains the same across the reported duplicate invocations is created by Lambda, not SNS... isn't it? If so, then this would be a bug not in SNS but in Lambda, retrying asynchronous things that actually succeeded... wouldn't it? That makes me wonder if the function invocations that get incorrectly executed multiple times would also end up in a Lambda DLQ due to the same issue... do you have any of those configured? – Michael - sqlbot Dec 17 '17 at 17:18
  • @Michael-sqlbot That's interesting. I didn't configure DLQ at that time though. And now I've replaced my SNS usage with DynamoDB and DynamoDB streams so I can't collect data for that anymore. – Noel Llevares Dec 17 '17 at 23:15