Guidance on Offline/Online Backup for AWS EC2, RDS & S3

Question

I have some EC2 instances (Linux and Windows) with attached EBS Volumes, some MySQL Database and S3 buckets in an AWS account. I am in a situation where

I won't have time to work on this project for about 6 months. I will definitely come back to working on this after this temporary hiatus
During this time, I am looking at backup options where I can reduce my AWS bill significantly.
I have access to offline storage which exceeds the sum of all EBS volumes that I have ( a windows share)
During this period, I expect exactly 2 operations to be performed on the data. Backing up from EC2/S3/RDS at the start of hiatus and restore the backup back to the AWS account at the end of hiatus i.e. I will not be trying to extract a file from the backup etc.

I'm looking for guidance on how can this be achieved with following considerations:

Cost - cost of storage is low.
Ease of Use - These backups will need to be restored back to the same AWS account.
Configuration Backup - I can reconfigure all EC2 instances etc. but a way to backup the configuration will be ideal
Time to backup & restore - Faster the better obviously

I understand there are going to be trade-offs (eg time v offine/online backup or time v cost) etc. But time to backup is the consideration that I am willing to be most flexible on.

I have seen suggestions about using s3 glacier or snapshot to s3 options but its not clear which will cost me more.

Let's assume below are the servers that I have

2 - Linux r4.xlarge Centos instance with attached 1000GB volume each
1- Windows m4.large instance with attached 500GB volume
1 RDS MySQL Instance with 500GB
2 S3 buckets with about 300 GB in each

thanks, I know. Glacier was my suggested solution to the problem. Seems to obvious to me! — davidgo, Oct 28 '19 at 17:35
@davidgo My only concern there is that I'm not really archiving the data, meaning in the near future I will have to download/restore all the data from glacier. My understanding is that kind of operation would be cost prohibitive with glacier pricing — Sarvo, Oct 28 '19 at 17:42
AWS Glacier is legacy IMHO, S3 glacier / deep archive classes are better. Data can go into S3 deep archive class, but RDS / EC2 instance images can't. You could always export your data then set everything up from scratch in six months. If you set up using infrastructure as code it would be fairly simple. The data can be stored outside of AWS or in S3 deep archive. Also, I think this is a very valid question and should be re-opened. — Tim, Oct 28 '19 at 18:19

score 3 · Accepted Answer · answered Oct 28 '19 at 21:13

I did a quick calculation which puts your current infrastructure at approx US$940/month, but that doesn't include bandwidth. Let's say US$1000.

I'll assume that your volume sizes are actual data. That's a fair bit of data, and that volume of data really makes a difference to your options as AWS bandwidth is so expensive. If you have less data the calculation could possible be different, making downloading your data more cost effective - but only for relatively small amounts of data.

I'll focus on cost reduction rather than easy / speed to spin up the infrastructure. I'll also assume us-east-1 / Virginia.

Let's address this service by service

S3

S3 with 600GB storage is only costing you $14/month so it's barely worth doing anything. Your options would be to:

Store in S3 IA class ($7.50/month)
Store in S3 Glacier Deep Archive class ($0.60/month)
Download it your local storage ($52 to download it, so more expensive than S3 for six months)

Recommended: S3 Glacier Deep Archive class

EC2 / EBS

Turning off the instances is obvious, so I'll look at the EBS.

EBS Snapshot to S3: 2.5TB is $125/month
Download the contents of the drives: $227 for bandwidth
Compress and store in S3 deep archive class: $2.53 per month

S3 deep archive wins again

RDS

RDS backups : 500GB @ $0.095/GB is $47.50 per month
Backup / dump the data to a flat file (eg mysqldump), store it in S3: $0.50/month

S3 Glacier Retrieval Costs

Bulk tier retrieval costs $0.0025/GB. You have about 3.6TB which makes the cost about $9.

Summary

Basically, Glacier Deep Archive is the best option for all of your data. Turning off EC2 / RDS instances, copying / archiving data, deleting the instances / volumes / snapshots / RDS backups, your bill drops from approx $1000 to approx $3.63 per month.

The big caveat with S3 deep archive is data will be charged a minimum of 6 months regardless of how long it's in there. Glacier class costs 4X more but has a minimum of 90 days. The next tier up is IA class which is another 3X up, for a total of 12X more expensive than deep archive.

Note that I have not catered for the S3 API request costs. These are usually relatively low if you're uploading a smaller number of large files, especially if you set your multipart chunk size higher.

s3 =
  max_concurrent_requests = 20
  max_queue_size = 1000
  multipart_threshold = 64MB
  multipart_chunksize = 128MB

There may also be some bandwidth charges for moving data around. If anyone points out anything significant I'll add it to my answer.

Guidance on Offline/Online Backup for AWS EC2, RDS & S3

1 Answers1