How "Real-Time" DynamoDB stream is?

Question

We are experimenting with a new serverless solution where external provider writes to DynamoDB, DynamoDB Stream reacts to a new write event, and triggers AWS Lambda function which propagates changes down the road?

So far it works well, however, sometimes we notice that data is being delayed e.g. no updates would come from Lambda for a few minutes.

After going through a lot of DynamoDB Stream documentation the only term they use is "near real-time stream record" but what is generally "near real-time"? What are the possible delays we are looking at here?

*no updates would come from Lambda* wouldn't necessarily be the same as the data being delayed. The Lambda function could be the cause of the delay. Have you also verified that the function isn't being invoked previously, possibly throwing errors, and the data you get is the result of Lambda retries? — Michael - sqlbot, Feb 01 '17 at 00:40
@Michael-sqlbot isn't lambda stateless? as I understood new lambda function execution shouldn't have any relation to the previous lambda running. — inside, Feb 01 '17 at 20:12
That's not quite true. Your code should be designed that way, and each container only handles one concurrent invocation, but the containers are often reused... so your function can fail or stall on subsequent invocations if it has design errors triggered by something you didn't clean up. Errors are retried after a delay, also. Read your cloudwatch logs and see if you see any potentially problematic signs of issues. — Michael - sqlbot, Feb 01 '17 at 22:06

score 4 · Answer 1 · answered Feb 08 '17 at 00:02

In most cases, Lambda functions are triggered within half a second after you make an update to a small item in a Streams-enabled DynamoDB table. But event source changes, updates to the Lambda function, changing the Lambda execution role, etc. may introduce additional latency when the Lambda function is run for the first-time.

score 4 · Answer 2 · answered Mar 16 '19 at 16:52

4

In my experience, most of the time it is near real-time. However, on a rare occasion you might have to wait a while (in my case, up to half an hour). I assume this was because of hardware or network issues in AWS infrastructure.

answered Mar 16 '19 at 16:52

Ioannis Tsiokos

835
1
12
14

score 1 · Answer 3 · answered Apr 06 '23 at 21:39

The answer to this question is not so simple and depends on multiple factors, here being a subset:

Lambda Event Source Mapping

To connect a Lambda trigger to a DynamoDB stream you need to use an Event Source Mapping (ESM). The ESM polls the DynamoDB stream every 250ms for more events, reading the amounts of items you've defined for BatchSize.

Lambda Timing

Lambdas processing time also plays a role, if you have a function that has a long duration this will result in a higher iterator age, which in turn will appear like the stream is slower.

DynamoDB Stream Shard Rollover

DynamoDBs stream shards roll over every 4 hours, meaning it marks the current shard as read only, while opening a new shard available for writes. This results in the ESM having to do a shard discovery and can also seems like a spike in latency, though short lived.

Poisioned Pill

When you configure and ESM, the number of retires is set to -1 by default. This means that in the event of an error, the Lambda will continually retry the failed event, until that event succeeds or is no longer visible, which is 24hrs (retention period of streams). When this occurs, any item written to that partition is blocked for 24 hours. It's important to ensure you define the retries of the ESM to be anything other than -1.

How "Real-Time" DynamoDB stream is?

3 Answers3

Lambda Event Source Mapping

Lambda Timing

DynamoDB Stream Shard Rollover

Poisioned Pill