1

On the Amazon AWS platform, is there a way of exporting an EBS volume to an external disk?

I.e. a backup outside of Amazon's infrastructure?

From what I have read so far:

  1. Amazon's "Import/Export (Disk)" service supports exporting data from S3 buckets

  2. EBS snapshots are implicitly stored in some form of opaque S3 bucket, but this bucket is not visible to AWS admins

So it appears there is no way to export EBS snapshots. Has anyone had luck with this?

Thanks,

EDIT: I have ~2.5TB of mongodb data, of which I need to make a local copy (i.e. a 2.5" external). Downloading that data will cost ~ $220 USD ($0.09/GB), and take ~ 10 days @ 3MB/s (not to mention if there are network issues). That is why I'm trying to go down the Amazon Import/Export process. My mongo instances use LVM/XFS, so I have the ability to generate snapshots.

mils
  • 111
  • 1
  • 4
  • It's not possible to export EBS snapshots, and it's deliberate on the part of AWS. If you want to back up an instance you can probably run software on your instance to do that backup, but you can't backup your EBS snapshots. – Tim May 30 '18 at 08:12

2 Answers2

5

EBS snapshots are stored in S3, but they're managed by EBS and in buckets that you aren't able to access.

While this sounds confusing, there is a good explanation.

EBS snapshots are not stored individually. They rely on information provided by the EBS infrastructure so that they only capture blocks that have been changed since the previous snapshot. (Take two consecutive snapshots of the same volume, and almost inevitably the second will complete faster than the first, for this reason.) The snapshot subsystem then backs up only those changed blocks, and creates logical links to blocks in the previous snapshots that are needed to restore the entire volume. Later, if those previous snapshots are deleted, only the blocks that are not linked to any later snapshots are purged. This provides advantages faster snapshots, and the ability to purge old snapshots without needing to worry about later "incremental" backups that depend on previous backups. EBS manages that aspect, keeping what is needed and purging what is not (and not billing you when unneeded data is purged).

This setup leads to a dramatic storage efficiency and cost savings, because you're only paying to store the differences. Comparing the total size of your snapshots and the number of GB of snapshot storage you are paying for, the total should be less, and the more snapshots you have of the same volumes, the total can be substantially less.

If the snapshots were stored individually in S3, the cost would be much higher.

However... there is a way to export an EBS snapshot offsite, but it's a manual process.

To do this, you need a spare linux EC2 instance. The simplified version of the process:

  • Boot the instance
  • Create an EBS volume from the snapshot
  • Attach the new volume to the instance, but don't mount it
  • Access the raw data on the volume using the assigned block device file, e.g. /dev/xvdf.

From here, you can use standard tools like dd or pv to read the raw data stream from the device, and send it where you want it. For example, let's assume you have an off-site SSH server that is accessible from the instance.

$ sudo pv -pterab /dev/xvzf | \
  pbzip2 -9 | \
  ssh user@offsite.example.com \
  'cat > /some/large/disk/my-snapshot.bz2'

Line 1 reads from the block device and shows a progress indicator.

Line 2 compresses the raw data using multicore bzip2 at maximim compression

Line 3 establishes an SSH connection to the offsite server, piping the compressed output

Line 4 writes the compressed disk image file to a file on the remote machine.

Bringing the volume back into AWS would involve creating an empty volume and reversing the process, piping the file back in, decompressing it, and writing it to a block device.

Note, however, that disk snapshots are not usually the best approach for backups. They are fast and easy, but relying on snapshots is a sign that your recovery strategy should be reconsidered.

If the volume in question contains a database, using logical backup tools for offsite backup is probably a tidier solution. If the volume contains assets, you can use tarballs or rsync. If the volume contains your application code, you really need an infrastructure that allows you to repeatably build working servers from scratch from version-controlled source, through automation. This requires a change of mindset and has a significant up-front investment in time, but will serve you much better over the long haul.

Michael - sqlbot
  • 22,658
  • 2
  • 63
  • 86
  • Thanks for the thorough response, Michael. Somehow I'm not surprised this is a little tricky. It is indeed a database (mongo), and I'm happy to bring it down for the purpose of this procedure. Do you think it would be possible to `dd` straight to an S3 bucket, so that I can mail Amazon a 2.5" disk and get a physical copy? Thanks – mils Jun 01 '18 at 03:09
  • Not directly or natively, because dd works with block devices and filehandles, and these aren't what S3 exposes. It would be possible to create a tool that does this, but it would require some significant care and attention. I think there are better solutions, though. – Michael - sqlbot Jun 01 '18 at 12:44
  • Can you create a backup using mongodump? Then try using either `pbzip2 -9` and `pixz -9` to compress the backup, and see which one creates a smaller resulting file. Either way, the resulting file should be only a fraction of the size of the DB. While it is admittedly apples vs oranges, I find pixz can reduce other backups, such as those from mysqldump, to less than 1/10 their original size (300 GB >> 30 GB or less!). If you can get the backup down to 10% of its original size, the problem becomes much simpler and less of a daunting task, and your costs and transfer time decrease. – Michael - sqlbot Jun 01 '18 at 12:49
  • mongodump has a (--gzip)[https://docs.mongodb.com/manual/reference/program/mongodump/#cmdoption-mongodump-gzip] option, so I believe I can do this inline. But the thing is, with wiredTiger the data is already compressed on disk, so I don't know whether or not there will be a space reduction. But I don't have any better options, so I guess I'll try :) – mils Jun 03 '18 at 23:30
0

At this time EBS exports aren't supported. You can review the AWS Import/Export document on this topic. Imports to EBS are supported, but only exports from S3 are supported.

You could export your data to S3 first, but this may not be ideal for your given need.

B. Miller
  • 697
  • 3
  • 9
  • This is where I get confused, EBS snapshots are supposedly already in S3 – mils May 30 '18 at 02:13
  • They are stored in S3 but not accessible like normal S3 data. S3 is more cost effective and durable (important for snapshotting) both for Amazon and the customer. You'd have to export the specific **data** you want to S3. There is no way to move a disk snapshot from this hidden location to a regular S3 bucket. – B. Miller May 30 '18 at 02:18