Duplication of data using data pipeline

Question

I am trying to make a backup of the dynamoDb data into S3 using the AWS data pipeline and scheduled it to every 15 minutes in the data pipeline setting. Template i have used is the default provided i.e. "Export DynamoDB table to S3".

Problem is that, we can understand with an example.

Initial state of Table is -> 3 rows are present With first save into S3, i get all these 3 rows.

before the second save into S3, i have added one more row into the table.

Now state of table is -> 4 rows are present. With second save into S3, i get 4 rows now, but i want to save only the newly added row.

How can i achieve this functionality?

And one more thing, is there any possible way in which i can delete the last added backup into S3 and saving the new one ?

score 0 · Answer 1 · answered Dec 16 '19 at 16:33

Dynamodb by itself backs up data to s3, which is used to create point in time screenshots.

But if you still need to create custom backup which are exactly in sync with dynamodb with acceptable latency, you can have a lambda function reading from dynamodb stream and writing to s3. It will make sure that you are only writing the items which are actually changing.

Duplication of data using data pipeline

1 Answers1