-1

It doesn't seem right to me that people are using RAID 1+0, RAID 5+0, or RAID 6+0 instead of using a RAID with 3 or more parities (akin to RAID 6) because the latter has a better reliability given the same level of redundancy.
    Consider the case of 4 identical 1TB drives. In this case, both RAID 6 and RAID 1+0 have 50% redundancy and equivalent theoretical maximum read and write throughputs (not counting seek times or RAID controller deficiencies). The RAID 6 array can survive any 2 drive failures. The RAID 1+0 can survive any single drive failure but has a 1/3 chance of array failure on the 2nd.
    With larger numbers of drives and with more parities, the differences are more obvious. For 6 identical 1TB drives and a RAID 6 with 3 parities, both RAID 1+0 and this 3-parity RAID would again have 50% redundancy and equivalent theoretical maximum read and write throughputs. The 3-parity RAID array can survive any 3 drive failures, whereas the RAID 1+0 can survive any single drive failure but has a 1/5 chance of array failure on the 2nd and a 3/5 array failure by the 3rd.
    So with a few calculations it's fairly obvious that increasing the number of parities is theoretically a more efficient use of drives than nesting RAID levels. So why aren't manufacturers adding more parities to their RAID controllers in preference to supporting nested RAID layouts? Can I create a RAID with at least 3 parities in Linux MD software RAID?

James Haigh
  • 358
  • 1
  • 2
  • 10
  • 2
    James, if you feel there are systemic issues that need examination, http://meta.serverfault.com/ is the right place to post that question. Regarding the above, I note that the mere existence of more syndromes doesn't make your question practical, as they would still have to be adopted into the standard before being implementable, and implemented before being usable; but it would be a good first step. – MadHatter Mar 11 '15 at 10:47
  • Sure, but I first want to determine whether the existing implementation of the 2nd syndrome is indeed the implementation needed for the 3rd, 4th, etc.. If it's just a case of changing some parameter for each syndrome then it's _already implemented_, and the question is therefore a practical question of how to use it or why it isn't used (I'll edit it to make it more specific when I have a firmer understanding of whether it is actually theoretically possible). – James Haigh Mar 11 '15 at 10:57
  • 2
    The way the syndrome computation works - it won't scale linearly for more parity drives. But there's not really any need, when you can just make smaller RAID groups. And you will _always_ need backups anyway, because by far and away 'user oops' is the biggest cause of data loss in the datacentre anyway. – Sobrique Mar 11 '15 at 12:19

1 Answers1

15

You may need to understand a bit more about RAID-6; I recommend reading Wikipedia's explanation.

The problem is that the second parity bit in RAID-6 isn't just a copy of the first (simple, XOR-style, as used in RAID-5) parity bit; that would be completely useless, because in the event of losing two data drives the fact that you have two surviving copies of the XOR parity bit won't be of any help. The second parity bit is derived from a completely different calculation, which has the mathematical property that, when combined with the XOR parity bit, it can recover data after two data drive failures (or after one data drive failure and the loss of the XOR parity bit).

If you wanted to add a third, fourth, etc. parity bit, you don't just have to add more parity drives: you have to come up with more calculations that enable survival after the loss of three (four, etc.) data and/or parity drives (more syndromes, as Wikipedia has it). It has to be something that can be calculated reasonably quickly, otherwise performance will suffer, and without reading huge amounts of data on a single bit flip, for similar reasons.

I'm not aware of the existence of a huge stack of candidate functions for this job, all just waiting in the wings for coders to go "yeah, let's have a couple of those...", and in the absence of them, adding more simple copies of the data discs is conceptually easy, computationally cheap, and - let's face it - pretty cheap in practice, given valuable data.

Edit on your comment, below: triangles are used heavily in construction because they are stable against in-plane deformation, which quadrilaterals are not. Given stable quadrilaterals, we could save a lot of material when making stressed structures; so why do we not use them? Answer: we don't use them because there don't seem to be any stable quadrilaterals.

The main reason we don't use more parity bits is because of the lack of suitable candidate functions which have been blessed by the appropriate standards body, and the main reason for the lack of blessing is the lack of eligible functions. The world isn't stupid; given such functions, we'd almost certainly use them - but until the candidate functions exist, their non-existence is the reason for their non-use. What else can be said on the subject?

Edit 2: right, I'm voting to close this question, because it no longer seems to me to admit of an answer.

Looking at the original question as asked, if you didn't know that the second parity is a completely different calculation from the first, that would explain the confusion; my answer was predicated on that assumption, but you say that you do understand this point.

So, knowing that you're aware that there aren't any standardised functions at this time, the answers to your question as written are

So why aren't manufacturers adding more parities to their RAID controllers in preference to supporting nested RAID layouts? Because there are no more parity calculations in the DDF standard to use.

Can I create a RAID with at least 3 parities in Linux MD software RAID? No.

But now you're going past that, and saying that you think there should be lots of candidate functions. Either you don't know what you're talking about, in which case there's nothing I can say to help, or you do, in which case I suggest that you write your function up and propose it to the SNIA so it can get blessed as fast as possible. Neither of those is suitable for SF.

Or are you suggesting there is a more sinister reason for the lack of officially-blessed parity functions, and instead asking what the International Global Conspiracy to Suppress the Third Parity Function is all about? In which case I still can't help you, because I don't think it exists, and the question is still off-topic for SF.

What are you asking?

MadHatter
  • 79,770
  • 20
  • 184
  • 232
  • 1
    I'm frankly not sure what the math looks like for ZFS's raidz3, but that's the only implementation of the kind that I'm aware of. – Shane Madden Mar 11 '15 at 06:34
  • “I'm not aware of the existence of a huge stack of candidate functions for this job, all just waiting in the wings for coders to go "_yeah, let's have a couple of those..._"” – As I understand, the 2nd syndrome used in RAID 6 is one of a large group (presumably the number of syndromes is an exponential function of chunk size, or something along those lines). XOR is always one of those, i.e. a special-case. The work gone in to supporting the 2nd syndrome in RAID 6 should have made it trivial to use other syndromes. – James Haigh Mar 11 '15 at 07:14
  • I don't understand why you're answer, which currently doesn't really answer the question, has already gained 3 upvotes yet my well-researched, legitimate question has had 2 downvotes and is currently on -1. If you're really sure that “the main reason we don't use more parity bits is because of the lack of suitable candidate functions which have been blessed by the appropriate standards body, and the main reason for the lack of blessing is the lack of eligible functions” then please provide a reference to where a standards body has confirmed this. I very much doubt that this is true. – James Haigh Mar 11 '15 at 07:32
  • 5
    @JamesHaigh Your question is (now) more theoretical than practical. Here on SF we deal with the practicalities of managing computing systems, we largely don't care how thy work just that they do as long as you follow these rules... I think your question is interesting but it's _not_ a SF question. Perhaps [Theoretical computer Science](http://cstheory.stackexchange.com/) is a better home. – user9517 Mar 11 '15 at 07:46
  • @Iain: I see. Yes, I'm very much a “why?” person, so I'm clearly not welcome here. But thanks for the heads-up. – James Haigh Mar 11 '15 at 07:51
  • @JamesHaigh I'm very much a why person too but sometimes the answer to why? is because. Consider _Why does the earth go round the sun?_ Initially we're taught Newtonian mechanics, it works, the maths is relatively simple and for most people who care why, it's good enough (SF). It _should be trivial_ However as our interest in celestial mechanics and mathematical ability increases, we learn the same things can be expressed in Einsteinian physics. The maths and concepts are much harder and more abstract. Then there's quantum mechanics ... – user9517 Mar 11 '15 at 08:09
  • I'd add - even if syndromes did exist, RAID 6 write penalty would increase with it. WP of 6 on slow SATA drives (which are what need the reliability) is already bad enough. WP of 8, 10, more? No thanks. – Sobrique Mar 11 '15 at 12:21
  • 4
    In the world of serverfault the answer to "why" is almost invariably "because money". Most things that aren't done are because the cost-benefit of doing so doesn't pan out. RAID-6 only really exists because of UBER rates on big slow drives meaning that odds of compound failures on RAID-5 sets were getting 'too high'. – Sobrique Mar 11 '15 at 12:24
  • I'm aware of two, maybe three sets of formulas for "extended parity RAID": two proposals for extending Linux software RAID (one set of formulas extends to arbitrary amounts of parity, the other is faster under some circumstances), plus raidz3, which may or may not use the same formulas as one of the Linux proposals. – Mark Mar 11 '15 at 19:45
  • 1
    Such functions are well-known since [half a century](https://en.wikipedia.org/wiki/Error_detection_and_correction#History). The classical parity bit is just a special case. Multiple parity drives are [well doable](https://web.eecs.utk.edu/~plank/plank/papers/CS-96-332.pdf), and may become necessary as the recovery times and multiple failure probabilities grow with capacity. – maaartinus Jan 12 '18 at 00:10
  • Well, [there is a triple parity RAID-7](https://en.wikipedia.org/wiki/RAID#:~:text=RAID%206%20(RAID-Z2)%20double-parity,%20and%20a%20triple-parity%20version%20(RAID-Z3)%20also%20referred%20to%20as%20RAID%207), and yet it seems not to be widely supported *shrug* – Hi-Angel Jun 02 '21 at 13:06