0

I have an EC2 instance with EBS volumes A and B attached to it, and I want to copy/replicate/sync the data from a specific folder in EBS A to EBS B.

EBS A is the primary volume which hosts application installation data and user data, and I'm looking to effectively backup the user data (which is just a specific directory) to EBS B in the event that the application install gets corrupted or needs to be blown away. That way I can simply stand up a new EC2 with a new primary EBS, call it C, attach EBS B to it, and push the user data from EBS B into EBS C.

I am using Amazon Linux 2 and have already gone through the process of formatting and mounting the backup EBS. I can manually copy data from EBS A to EBS B but I was hoping someone could point me towards a best practices for keeping the directory data in sync between the two volumes?

I have found recommendations for rsync, a cron task, and gluster for similar use cases. Would is be considered good practice to use one these for my use case?

Tikiyetti
  • 445
  • 1
  • 4
  • 17

1 Answers1

2

While you can use rsync, a better alternative is Data Lifecycle Manager, which will make automated EBS snapshots.

The reason that it's better is that you can specify a fixed number of snapshots, at a fixed time interval, so you don't need to restore the latest (important if the "current" data is corrupted).

To use this most effectively, I would separate the boot volume from the application/data volume(s). So you could just restore the snapshot, spin up a new instance, and mount the restored volume to it.

guest
  • 871
  • 4
  • 5
  • Fortunately I already have the boot volume as a separate volume. The main reason I didn't go with this approach initially is because if I were to restore from an earlier snapshot of the full volume, I would be losing the delta in user data too. That's why I am trying to break out just the user data from the application data. Otherwise yes, I would have gone with DLM. Unless...Can it snapshot specific folders of a volume, or does it have to be the full volume? – Tikiyetti May 16 '19 at 18:08
  • @Tikiyetti - it snapshots the entire volume. But if you need to track user data separately, it could always go on its own volume. – guest May 16 '19 at 19:07
  • The way snapshots work is one of the cooler part of EBS: it's truly a point-in-time snapshot of the volume, based (as far as i can tell) on recording a list of immutable block IDs that represent the data on the volume. As a result, multiple snapshots are very cheap, since they only record the differences from the previous snapshot. And there's no race condition like you'd get from `rsync` trying to write files that may be changing underneath it. – guest May 16 '19 at 19:09
  • Ok so in my case then, since I'd be blowing away the original EBS with app/user data, and a full snapshot/backup would still contain the user data anyway...I may as well just take incremental snapshots of the whole volume and if I need to blow away and restore, I can always be selective about what folders I take from the snapshot, right? Because I may not want the app data (possibly corrupt) from that snapshot, but I will want the user data. That's how I understand it at elast. – Tikiyetti May 16 '19 at 22:53
  • @Tikiyetti - it really depends on how you want to structure your volumes. You could have separate volumes for application and user data, but that might be more of a management overhead than you want. But definitely yes for frequent incremental snapshots. – guest May 17 '19 at 13:27