Scheduled clear up of DynamoDB after import to s3

Question

I have dynamo DB table on which I need to perform these actions on a weekly/monthly basis.

export data into s3
delete from Dynamo DB, the data exported into S3

Use case: We have only 10% traffic open and have 3k items and growing. Also we need to give access to this data for another account and prefer not to give access to table directly. To save the retrieve time and allow data access to different account, and data may not be used again in near future, we are planning to import data to S3.

Options:

Data pipeline is too complex and we don't wish to use EMR cluster.
Not going with glue since there is no analysis to be performed.
AWS in-build DynamoDB to S3 import

Planning for s3 import(3)+ lambda to schedule the import and delete the dynamo DB records based on EventBridge rule.

Will this suffice or is there any better approach? Please advice.

score 1 · Answer 1 · answered Jun 07 '23 at 18:48

A few options to consider:

Evergreen tables pattern

Create a new table each month, have your application write to the new table based on current time
When new month comes, old month's table can be exported to S3.
Delete old month's table after export is done and you don't need it anymore

This one is probably the most cost effective because you can control the duration the items sit around better. The biggest hassle is needing to provision new tables, update permissions, and have application logic to switch at the right now. Once it's up and running, it should be smooth though. This is a pattern that's really common for folks using DDB for things like ML models where they rotate them regularly and don't wanna pay for deleting all the old data. If you have strict SLAs on how long old data can be around, this might be the best option.

TTL pattern

Set all your data to TTL at the end of the month
Export your data before TTL window
Let TTL expire items

This has the issue that TTL can take a fairly long time (days) to clean up a lot of items, since it's using background WCUs, which means you pay for the storage for a bit longer. Plus side is that it is cost effective on WCUs. If you don't have some compliance need to get the data off DDB at a specific time, this works fine.

Glue scan and delete pattern

I say use Glue, but really it's just that Spark-like things are pretty effective at doing stuff like this, even if it isn't analytics. You can also make it work with something like Step Functions, if you'd rather do that.

Kick off export
Use the export data in Glue to then have Glue kick off deletes of DDB

This has the downside of being fairly expensive (gotta have extra WCUs to handle the deletes). It's fairly simple from your application's perspective, though. If you can't change application logic (to set TTL or which table is being written to), I'd go with this option.

score 0 · Answer 2 · answered Jul 27 '23 at 11:45

0

You can use https://www.npmjs.com/package/dynoport to export data from dynamodb in a high performant way and export it to s3 using a ecs cron

answered Jul 27 '23 at 11:45

priyanshu kumar

76
1
6

Scheduled clear up of DynamoDB after import to s3

2 Answers2

Evergreen tables pattern

TTL pattern

Glue scan and delete pattern