0

I currently have a use case to copy data from DDB Streams to Kinesis Data Streams (just to increase data retention period). With DDB Streams, its just 24 hour retention versus with Kinesis Data Streams is upto 7 days.

So, I was thinking a of a lambda to copy the items from DDB Streams to Kinesis Data Streams but I'm not sure if the ordering / duplicate records case would come into play when I do the copy, because I'm guessing "Consumer" failures (i.e) Lambda failures might result in out of order delivery of stream records to DynamoDB and also there might be duplicate records in the Kinesis Data Streams? Is there a AWS customer built solution to handle this or any workaround this?

Also, the reason behind me opting for Kinesis data streams/ DDB Streams was because I'm going to have a lambda work off the stream and I'd like the lambdas to be triggered per shard.

dashuser
  • 181
  • 1
  • 3
  • 16

1 Answers1

0

since you have one producer which is the dynamodb streams what you can do is have a lambda function which consumes the stream and inserts it into FIFO SQS queue, you can then deduplicate events by following the below post :

https://dev.to/napicella/deduplicating-messages-exactly-once-processing-4o2

btw you can set SQS retention period to 14 days, so you can use it instead of kinesis if you're not looking for realtime solution

a sample use case https://fernandomc.com/posts/aws-first-in-first-out-queues/

dashuser
  • 181
  • 1
  • 3
  • 16
AWS PS
  • 4,420
  • 1
  • 9
  • 22
  • Nice, thats a good one! But the thing with Kinesis Data Streams is that I can get multiple lambdas triggered off of each shard and I deal with that number of concurrently running lambdas. But with SQS I dont think the lambda triggers would be based of the shards. That's my constraint to even think of KinesisDataStreams and DDB Streams. Your approach would work for me if I didnt have that constraint in place. (Will update the question with this constraint)! – dashuser Jan 03 '20 at 22:08