11

I am using Data Pipeline (DP) for daily backups of DynamoDB, however, I would like to do incremental backups of the data that is missed by DP runs (updates between DP runs). To accomplish that, I would like to use DynamoDB Streams + Lambda + S3 to bring real-time DynamoDB updates to S3. I understand how DynamoDB streams work, however, I am struggling with creating a Lambda function that writes to S3 and say rolls a file every hour.

Has anyone tried it?

Shay Ashkenazi
  • 467
  • 1
  • 4
  • 11
user3293898
  • 271
  • 1
  • 3
  • 8
  • 3
    What do you mean by "rolls a file every hour"? You can't append to files in S3. You would have to create new files for each update unless you wanted to read the entire file each time, add data to it and then write it back to S3 again. Which sounds painful. – garnaat Mar 11 '16 at 15:51
  • Rolling a file like in log4j or other frameworks where files can be rolled based on a certain criteria. I know files in S3 are immutable, so I was wondering if this is even possible. – user3293898 Mar 15 '16 at 18:18
  • This article explains the flow really nice, in this case its via TTL but you change that part. https://aws.amazon.com/blogs/database/automatically-archive-items-to-s3-using-dynamodb-time-to-live-with-aws-lambda-and-amazon-kinesis-firehose/ – Oguz Aug 16 '22 at 12:16

1 Answers1

12

Its an hour job dude,What you need to do is

  1. Enable Dynamo DB update Stream and attach aws provided lambda function https://github.com/awslabs/lambda-streams-to-firehose
  2. Enable Firehose stream and use above function to stream outs records in firehose.
  3. Configure Firehose to dump the records to S3.

done.

AnubhavJain
  • 129
  • 1
  • 3
  • 1
    what if you don't have firehose available? – Cacho Santa Jan 19 '18 at 19:31
  • 1
    @CachoSanta Firehose helps in buffering data and finally gives you a chunk in S3. Without Firehose you will have to implement buffering logic yourself, which is tricky using Lambda. By tricky, I meant that you will have to use some external service. – Arry Feb 24 '19 at 18:18
  • @Arry Same question here, If I need to stream DDB changes into S3 bucket, and the dependency I use need to have the same regional availability as Lambda, what other options do I have? Lambda is available in every AWS region except for Osaka. https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/ – Jialun Liu Jun 29 '20 at 21:02
  • can we use AWS Kinesis here? – mshikher Jan 28 '21 at 22:59