1

I have a relatively large library of about 50TB that I need to back-up on at least a weekly basis. Currently, the video content is stored across an array of raided 3TB hard drives. I would estimate the amount of new content per week at about 300GB.

A cloud solution is out as it is prohibitively expensive for this amount of storage.

What would you suggest on how to back-up this digital library? What about LTO-5 tapes?

David542
  • 939
  • 3
  • 10
  • 15
  • How much of the data changes each week? How big is the average file in this library, and can they be compressed? How much is "duplicate" content (so that a de-dupe file system might be useful)? – Mark Henderson Feb 08 '12 at 00:44
  • We need more information. What's your daily or weekly change volume? What does your backup window look like? How is the data stored (file system, databases, etc)? What's the storage platform? What does your restore SLA look like? Do you need to restore all the data to return to business or can you get away w/ restoring a subset initially while taking longer to restore the entire corpus? – Evan Anderson Feb 08 '12 at 00:46
  • @EvanAnderson: the video would just need to be 'recoverable': it wouldn't necessarily be a time-sensitive issue, as long as all the video files were there. – David542 Feb 08 '12 at 00:49
  • Do you want to build something, or do you want to buy something? Is Please provide more details about what you're using to store this data. – LVLAaron Feb 08 '12 at 00:52
  • I believe the content is stored on 3TB consumer-grade hard drives. I would be open to either building or buying something. – David542 Feb 08 '12 at 00:57

3 Answers3

6

By my math, you have full turnover in about 170 weeks, or 3 years.

The key thing in figuring you recovery time objective (RTO) is what data your users need to be productive. Since this is video files, I'm guessing that restoring the most recent data first will get them productive while you bring the rest of the library online. But only you know how often the older stuff is accessed.

You'd be surprised how well tape can handle a job like this. Video files are big and long, and if they're not fragmented all that much can stream very fast. And importantly, will restore very fast since it's lots of big, sequential writes. A weekly net-change tape cross referenced with a database to track what's on each tape could give you a sizeable offline archive if you want.

If your videos are of the write-once-read-many variety, you can go a long, long way by just doing a weekly changed-data backup coupled with a data-replication solution. It would give you the 'instant recovery' of a full replicated solution, but with an alternate recovery method in the form of the tapes. It'll mean doubling your direct storage costs, but you can't beat the time-to-recovery of having a hot spare.

If a hot spare is too rich for your blood, the cost-per-GB of tape is still well below that of disk. It'll take longer to recover, and take a long time to fully back up, but it'll get you there in the end.

sysadmin1138
  • 133,124
  • 18
  • 176
  • 300
2

Well, there's no "cheap" way to go about this.

I think backblaze has already done most of the hard work for you, though. Here's an excellent article about how they did it. http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/

They opensourced the design and someone is selling everything you need minus disks. http://www.protocase.com/products/index.php?e=Backblaze

LVLAaron
  • 436
  • 6
  • 13
0

Since it doesn't sound like you're going back and changing things that have been stored once, I'd recommend tape. It needs to be managed by strong backup software which can identify bad media, but it sounds like you could get away with very little in the way of hardware. A dual head library should suffice, and that will allow you to let the backup server do background reclamation between weekly incrementals. 300GB a week will take under an hour per week on a single LTO-5 drive.

You can instruct the software to prepare a set of tapes to be sent offsite for site recovery needs. This will avoid the requirement of sending 300GB per week over your WAN for some sort of replication.

Basil
  • 8,851
  • 3
  • 38
  • 73