0

I want to archive dynamodb table, keeping data only for 90 days. I have a field called recorded_on in the table which I can use to track 90days. Looked at Datapipeline and it seems little overkill with EMR since we don't need it. Any better ways to do this?

1. Cronjob that will continue to run everyday and match recorded_on + 90days > today's date and put those rows in s3 and delete those rows.

2. Separate cronjob to put data from s3 to redshift everyday.
user3089927
  • 3,575
  • 8
  • 25
  • 33

3 Answers3

0

Why do you think using AWS data pipeline is overkill? You can use custom job but it will require additional work which pipeline does it for you automatically.

The fact that it uses EMR cluster behind the scenes shouldn't be a problem as its details are anyway abstracted away from you. Setting up pipeline to archive dynamoDb to s3 is very easy. For deleting data older than 90 days you can write a custom script & use Data Pipeline ShellCommandActivity (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-shellcommandactivity.html) to execute it.

Here are some benefits of Data Pipeline over CRON:

  1. Retries in case of failures.
  2. Monitoring/Alarms.
  3. No need to provision EC2, AWS takes care of everything behind the scenes.
  4. Control how much dynamoDb capacity the export can use, this is very important in preventing the export job from impacting other systems.

Its also very cheap, https://aws.amazon.com/datapipeline/pricing/.

Regards Dinesh Solanki

dinesh
  • 531
  • 2
  • 8
0

You could create a scheduled Lambda function that runs daily (or at whatever interval you want) that performs the query and archives the items.

Or, if you want that to scale and perform better, you could have the Lambda function perform the query and then write a message to an SNS topic for each item that needs to be archived and have another Lambda function trigger on that SNS topic and perform the archive operation.

garnaat
  • 44,310
  • 7
  • 123
  • 103
0

I know this is an old question, but for the sake of anyone stumbling upon this question:

You can now use the DynamoDB TTL (Time To Live) feature to automatically delete old data. A lambda function triggered by a stream event on that table can then be used to archive the deleted record to S3 - or wherever you'd like.

There is a detailed post on how to achieve exactly this on the AWS blog: https://aws.amazon.com/blogs/database/automatically-archive-items-to-s3-using-dynamodb-time-to-live-with-aws-lambda-and-amazon-kinesis-firehose/

chib
  • 615
  • 6
  • 13