How backup a distributed file system?

Question

Note: This is a "theoretical" question, as I haven't got that kind of data yet.

If you have a distributed file system spanning a dozen or more servers, and TBs of data, how do you perform backups of that? Local tape drives aren't an option as I am renting the server and don't have physical access to them. The way I see it, I simply must have a backup cluster that is proportional in size to the source cluster. Sending all of that data over the network in parallel would probably saturate it, bringing the throughput down. But the backups all have to be taken at the same time, so doing round robin backups doesn't seem to make sense. One way around this problem would be to just reserve a small portion of the large (in my case) drives, and keep the rest for rotating local LVM snapshots. Unfortunately, that kind of backup would be useless if the server get compromised. Are there any other options to create a point-in-time backup that doesn't kill the network?

[EDIT] SOLUTION:

1) Replicate the entire data set in (near) real time to one large local backup server, so the bandwidth usage, and IO, is spread over the day, and local bandwidth is usually "free".

2) Create the real backup off that machine and send it off site. If you have all the data together, it should be easy to do a differential backup, which saves billable bandwidth.

OS is Linux. Filesystem would be either Ceph, or HDFS, or similar. — monster, Apr 22 '11 at 14:03

score 2 · Accepted Answer · answered Apr 22 '11 at 12:59

If you find that you have more data that you can copy in your backup window - then you need to look into replication your entire data set off site in real time, or as close as you can get it, using separate infrastructure. (different subnets, VLAN, different pipe to the outside work etc)

I would use iSCSI, in fact specifically, I would use openfiler to have my backend data replicated to the outside world, plus the other goodies that you can get with openfiler.

Failing that, I would locally use DRDB, (assuming linux) and replicated that to a few other servers, and then run my backups off them.

The best advice I can offer people, is to separate their critical data and make sure it's copied to redundant disk space, like a SAN, or very least NAS. That way you can pretty much deploy any local backup mechanisms you want, knowing your safe because your critical data replicates offsite anyway. It's a pain, and management may not agree at first, but ask them to do the figures on how much the will lose in downtime of a week, you'll find that your budget will miraculously increase!

Thanks for the tip on Openfiler. I knew about it, but last time I look it hadn't been updated for over two years, so I thought the project was dead. — monster, Apr 22 '11 at 14:11

cwheeler33 · Answer 2 · 2011-04-22T13:27:51.550

So the servers are in a co-location, hmmm...

I would add a server to the farm in the co-location and have it receive a copy of all DFS data. Bandwidth is less of an issue since it's local. This server can then handle the processing of compressing and replicating data off site.
Then I would use that server with it's own bandwidth to replicate to a secondary site. There are "cloud backup" solutions that will only replicate bit level changes. Bandwidth is conserved by compressing the sent data. Besides compressing, the data is usually encrypted.

This solution is becoming a more common practice and there are a growing number of vendors providing the backup software and storage. Dealing in TB for initial purchase of backup usually means more bargaining power.

This idea applies whether Linux or Windows. The specific software will depend more on your budget and the OS you use.

Other stuff to consider. Your total data might be 10TB. Your daily change in data with traditional backups might be 200GB. But the bit level change might only be 30GB. IF that data is compressed, then you might be able to get down to 20GB. You will need to know your data before you can plan appropriately.

How backup a distributed file system?

2 Answers2