We have a 3x daily recurring automated task to download and restore a full backup of a large DB from an EU S3 bucket to our on-prem server in the US. This was setup when the DB itself was small and transfer time/costs were minimal. Due to factors outside our control, the DB is now 70+ GB. Full backups are taken every 3 days, with diffs taken every 8 hours. Our 3x daily automated task necessitates pulling the .bak files of the most recent full and diff. At download speeds of about 120 Mbps during the day, it can take several hours to pull this from S3, and at 3x daily, 365 days a year, for $.09/GB transfer out of S3, the transfer costs alone are non-trivial.
There seem to be plenty of options to minimize cost and runtime here.
- We could cache the full .bak files locally and check to see if the file already exists locally before pulling it from the EU S3 bucket. There is only 1 full .bak file every 3 days, yet there are 9 restores, so 8 out of 9 of these could use a cached copy.
- We could additionally alter our backup strategy to take less frequent full backups, so that the full .baks would be downloaded even less often.
- Transfer costs from S3 to another AWS service within the same region are free, so if we were able to restore this DB to an EC2 instance in EU that would be great, but the teams using the restored DBs currently need them hosted on-prem, so that's a long term thought.
- We could proactively replicate this DB from the EU bucket to a US bucket, and download from there, but this would double our storage costs, and transfer out of S3 costs the same regardless of region.
- The AWS backbone is faster than the internet, and S3 to CloudFront is free, so in theory we could privately access these files through CloudFront which would provide an edge location with faster speeds, plus CloudFront is also slightly less expensive than S3 at $.085/GB. This seems like a lot of engineering work for a small cost savings though. Our codebase is C#, and we're currently getting files using the AWS SDK for S3 - I haven't looked into how this could work with CloudFront (and I feel like I'm probably missing something here).
My plan is to implement a local cache (#1), which is a code-based solution on our side. This seems like a kind of common use case though, and I'm wondering if I'm missing something obvious.