0

I have a question about RAID 5 rebuilding.

Let's assume the following: A RAID 5, 3 disk (5TB each) array (i.e. 10TB useable).

If a disk fails, the data remains accessible (e.g. employees could continue working) but in a 'crippled' state.

It's widely known that the RAID controller can rebuild the array structure once the disk has been replaced, although there is a fair chance of experiencing a URE which means your data cannot be rebuilt and is lost.

However, what is to stop you, upon finding out a disk has failed, copying all the data from the limping array to a network storage location, recreating the RAID array from scratch with a new disk, and copying the files back?

This way, any UREs encountered would manifest as individual file failures rather than complete failure.

Would this work?

Thanks in advance, Laurence

  • An intelligent RAID layer could keep track of that on its own. The end result of using such an intelligent RAID layer would be better than what you suggest. But the risk of the complete failure of one disk combined with bad sectors on other disks is just too high for me to recommend RAID 5. On a RAID 6 with two parity disks you can lose a disk and have lots of bad sectors on the rest with minimal risk of data loss due to the same sectors being bad. – kasperd Dec 24 '14 at 02:23
  • Essential reading: http://www.standalone-sysadmin.com/blog/2014/11/recalculating-odds-of-raid5-ure-failure/ – Andrew Dec 24 '14 at 05:02
  • @Andrew not as esssential as one might think. The data sheet values for drives are warranties - if you have less bytes read and encounter an URE, you might return a drive. If you look at the SMART data of a long-standing busy drive, you will find that it typically will have read hundreds of terabytes of data without a single unrecoverable read in the statistics. It would be good to have some real-world statistics on this topic, most of the articles I have seen about URE calculations look like FUD. – the-wabbit Dec 24 '14 at 08:11

1 Answers1

1

Let's assume the following: A RAID 5, 3 disk (5TB each) array.

Already we're off to a terrible start.

If a disk fails, the data remains accessible (e.g. employees could continue working) but in a 'crippled' state.

Assuming that there is no URE on one of the remaining two disks exactly where someone has a file.

However, what is to stop you, upon finding out a disk has failed, copying all the data from the limping array to a network storage location, recreating the RAID array from scratch with a new disk, and copying the files back?

Nothing except now you're stressing a degraded array's disks and copying TBs of data which will not be quick even on a GB network. If your hard drives are all from the same lot, it is not entirely uncommon for disks to fail in a very short amount of time from eachother. Also, if the aforementioned possibility of a URE comes true, then you've now lost data which you will have to recover from backup.

This way, any UREs encountered would manifest as individual file failures rather than complete failure.

UREs encountered can be manifested as individual file failures by reasonable storage controllers. You don't have to fail hard on a rebuild when a URE is encountered. See your storage controller's manual for more info.

TL;DR

Use RAID 10 or 6 with hot spares and reasonably sized hard drives. Your hairline and career will thank you later.

Wesley
  • 32,690
  • 9
  • 82
  • 117