I want to create a process to transfer files from EC2 / EFS, to Glacier - but with compression. Say there are directories with timestamps down to the hour. Every hour, I want a process that checks for directories older than 24 hours (configured), will zip-up the files in that directory, and move the zip-file to Glacier (and both the files and zipfile, deleted). Plus, high-reliability, some kind of failure/retry logic. And ideally, that uses an existing tool, or doesn't require a lot of external coding/logic.
I've found a lot of tools that almost do this:
- AWS DataSync - moves files reliably - but no option to add compression
- AWS DataPipeline - transfers files with logic - but doesn't support EFS? (Or Glacier, but I suppose I could move the files to S3, with a transfer to Glacier).
- some hybrid solution, like
- AWS DataSync with a cronjob that does the zip-file - but what about retries?
- AWS StepFunction Workflows running a Task on the EC2 box where EFS is mounted
One tool that I'm fairly sure would do it, is Apache-AirFlow
, which does workflows - but that requires a lot of manual coding, and I'm not sure if AWS StepFunctions would be the same result anyway.
It seems like this should be a solved-problem - schedule and compress a directory of files, move it to Glacier (with retry-logic) - but I haven't found any really clean solutions yet. Is there something I'm missing?