Note: This is a "theoretical" question, as I haven't got that kind of data yet.
If you have a distributed file system spanning a dozen or more servers, and TBs of data, how do you perform backups of that? Local tape drives aren't an option as I am renting the server and don't have physical access to them. The way I see it, I simply must have a backup cluster that is proportional in size to the source cluster. Sending all of that data over the network in parallel would probably saturate it, bringing the throughput down. But the backups all have to be taken at the same time, so doing round robin backups doesn't seem to make sense. One way around this problem would be to just reserve a small portion of the large (in my case) drives, and keep the rest for rotating local LVM snapshots. Unfortunately, that kind of backup would be useless if the server get compromised. Are there any other options to create a point-in-time backup that doesn't kill the network?
[EDIT] SOLUTION:
1) Replicate the entire data set in (near) real time to one large local backup server, so the bandwidth usage, and IO, is spread over the day, and local bandwidth is usually "free".
2) Create the real backup off that machine and send it off site. If you have all the data together, it should be easy to do a differential backup, which saves billable bandwidth.