Backing up symlinks using AWS s3 sync

Question

I'm attempting to backup our system using the aws s3 sync command, however this will either backup the entire directory behind a symlink (default behaviour), or not backup the symlink at all.

I'd like some way of backing up the symlink so it can be restored from S3 if need be.

I don't want to archive the entire directory first, else I'll lose the ability to only backup the changed files.

My current thought is to scan the dir for symlinks, and create metadata files containing the symlink's target, which, after restore, could be read to rebuild the symlink, but I'm not quite sure how to do this.

Any advice would be very welcome. Thanks in advance.

score 3 · Answer 1 · answered Nov 23 '17 at 19:03

As is, S3 has no standard way to represent a symlink. Note that you could decide of a custom representation, and store that in the metadata of an empty S3 Object, but you would be on your own. AFAIK, aws s3 doesn't do that.

Now, for purpose of backing up to S3 (and Glacier), you may want to take a look at OpenDedup. It does use the same type of rolling checksum as used in rsync to minimize the actual storage used (and the bandwidth).

I've been doing a lot of cp -rl and rsync custom scripts to backup my own system to local drives, but was always frustrated about the unnecessary extra storage due to many duplicate files I may have. Imagine what happens in those simple schemes when you rename a directory (mv dirA dirB): the next backup typically stores a brand new copy of that dir.

With OpenDedup (and other similar systems, such as bup, zpaq, etc.), the content is stored uniquely (thanks to the rolling checksum approach). I like that.

score 1 · Answer 2 · answered Dec 10 '17 at 14:10

Right now, Amazon S3 does not support symbolic links. It will follow them when uploading from the local disk to S3. According to the AWS documentation, the contents of the symlink are copied or sync’d under the name of the symlink.

The rsync command does have options for symbolic links. One of them --copy-links will copy the destination exactly. So if your symlinks use absolute paths (my/absolute/path), it will copy that path and the symlink on S3 would point to the directory on your local box. If you use relative paths (.../.../path) then the symlink would be pointing to that path on S3.

Rsync would be a way to keep your symlinks to use after restoring the files back to your local box.

Another method would be to use an AWS S3 sync or backup service, such as NetApp’s Cloud Sync, which would catalog your data with each operation. Each service provider offers different features, so how symbolic links would be handled depends on the vendor chosen.

Note there are now options `--follow-symlinks` | `--no-follow-symlinks`, default is to follow when uploading. — Chris, Apr 04 '21 at 14:16

Backing up symlinks using AWS s3 sync

2 Answers2