A few options to consider:
Evergreen tables pattern
- Create a new table each month, have your application write to the new table based on current time
- When new month comes, old month's table can be exported to S3.
- Delete old month's table after export is done and you don't need it anymore
This one is probably the most cost effective because you can control the duration the items sit around better. The biggest hassle is needing to provision new tables, update permissions, and have application logic to switch at the right now. Once it's up and running, it should be smooth though. This is a pattern that's really common for folks using DDB for things like ML models where they rotate them regularly and don't wanna pay for deleting all the old data. If you have strict SLAs on how long old data can be around, this might be the best option.
TTL pattern
- Set all your data to TTL at the end of the month
- Export your data before TTL window
- Let TTL expire items
This has the issue that TTL can take a fairly long time (days) to clean up a lot of items, since it's using background WCUs, which means you pay for the storage for a bit longer. Plus side is that it is cost effective on WCUs. If you don't have some compliance need to get the data off DDB at a specific time, this works fine.
Glue scan and delete pattern
I say use Glue, but really it's just that Spark-like things are pretty effective at doing stuff like this, even if it isn't analytics. You can also make it work with something like Step Functions, if you'd rather do that.
- Kick off export
- Use the export data in Glue to then have Glue kick off deletes of DDB
This has the downside of being fairly expensive (gotta have extra WCUs to handle the deletes). It's fairly simple from your application's perspective, though. If you can't change application logic (to set TTL or which table is being written to), I'd go with this option.