0

I have a pipeline like this -
table 1(dynamo db) -> aws lambda -> table 2 (dynamo db)

So whenever there is any update hapeens in table 1 then lambda gets trigered. So lambda basically batch read( 1000 records) from table 1 , then perform a batch compute to come up with the list of records that's needed to be updated in table 2. Table 2 basically maintains the count of certain event happening in table 1.

So problem is if we send the same batch of records twice then it will increment the count in table 2 twice.

Why am i considering this as during outage on one of the lambda function ( the number of lambda running is 1:1 relation with the number of partitions in dynamo db ) while it had performed some of the writes operation, it will resend the last batch read.

To avoid this one way can be to store the sequence number of the records we have already computed and store that in table 2. So when ever we update we can check if its already computed. But we need to maintain the size of that list else we will get performance issue. But what size it should be is an issue.

What should be the write approach to handle these kind of issues?

rcipher222
  • 369
  • 2
  • 4
  • 15
  • Are you using dynamodb stream to stream the data to Lambda function? The dynamodb stream only sends the changed values. It doesn't send any data when the data is not changed. – notionquest Jul 14 '17 at 10:37
  • Yes i am using dynamodb stream to stream data to lambda funciton. the scenario i am talking about is the lambda function received the batch of records but while processing( say n of those records were processed and had updated the table2 ) the those batch the lambda goes down and when it comes back live it will retry to process the same records, and those n records will be processed again and resulting in wrong count in table 2. – rcipher222 Jul 15 '17 at 07:31
  • For my taste, you are using the wrong technology stack here. You are trying to achieve transactional behaviour in a non-transactional database. Consider switching to regular RDBMS. – Adam Owczarczyk Jul 27 '17 at 09:45
  • @rcipher222: I'm in a similar situation. Were you able to figure it out? – dashuser Dec 20 '19 at 20:39

0 Answers0