2

I will be setting up a server at work and I need some advice on some details. The setup will be one blade-type server (8-core, 16GB RAM) with two subsystems - one for the main storage the other to back it up. I'm shooting for a 20TB array (I know it'll be less after formatting and parity drives).

Is there any advantage one way or the other with either 20 1TB drives or 10 2TB drives?

I'm not sure right now how many controllers I should have either (in the quote I have is a dual-port controller). I would think two controllers for a server of this size would be a better choice than the dual-port controller (but I really don't know).

Would an array of this size have any performance issues in RAID 5 or 6 (I know RAID 5 or 6 are "slower" because of all the parity calculations).

Also, these will be either WD RE3 (1TB) or the RE4 (2TB).

Oh, also, for the backup array would it be ok to use the WD 2TB green drives (also in RAID5 or 6)?

pauska
  • 19,620
  • 5
  • 57
  • 75
  • 2
    Don't forget that rebuilding an array of that size is likely to be a painful experience. – pehrs Mar 12 '10 at 09:45
  • 2
    Regardless of which way you go you should really factor in a couple of hot spares. – John Gardeniers Mar 12 '10 at 10:41
  • 1
    One spare should be sufficient with RAID6 – pauska Mar 12 '10 at 12:27
  • In principle one spare would be enough for RAID 6 but with such large drives there is an increased risk of another drive going down while the array is rebuilt, which will take a fair bit of time. If it was my system I'd want that second spare. – John Gardeniers Mar 13 '10 at 06:35
  • Spares should be avoided on Raid 5/6 this big. The automatic rebuild is a problem. With arrays this big, chances are it will break again during rebuild. Once a drive fails, you want to check your backup, and choices, before a rebuild and start a manual rebuild over the weekend. – Posipiet Mar 16 '10 at 14:53

7 Answers7

4

20 drives 1tb - more IO performance for at least reading (depends how you set them up).

10 drives 2tb: less IO perforamnce, problems with RAID 5 quality - go fo RAID 6 to avoid issues in case a disc fails. 2tb drives are the limit to reliably rebuild.

I'm not sure right now how many controllers I should have either (in the quote I have is a dual-port controller).

You should go for a SAS setup. Would be ok to use 1 SAS port, though I would go with a dual port cage an a dual port controller. I ahve a setup similar to that (well, smaller, faster) using WD 300gb Velociraptors and use an Adaptec 5805 and a Supermicro cage with 24 slots in 1 rack unnits (2.5" all) and a SAS backplane for them.

note that SAS can handle SATA drives - they are logically and physically compatible, just plug them in.

The Adaptec handles up to around 190 discs ;) And has plenty of CPU perforamcne for that little IO.

Oh, also, for the backup array would it be ok to use the WD 2TB green drives (also in RAID5 or 6)?

Backup - in general slower is ok. So, yes.

TomTom
  • 51,649
  • 7
  • 54
  • 136
2

If you're using the controller to do your RAID'ing then having two controllers will obviously mean you'll have two separate arrays (i.e. 2 x 10 disks or 2 x 5 disks), if you're using software RAID then this isn't a problem.

Also something to be aware of; 20 disks in a R6 config = 18*(1000*.95)=17.1TB, Windows (you don't mention what OS you're using) will support 16TB volumes with the largest cluster size but any more than that is 'unsupported' at best, if it's linux then things vary.

Now onto the question, and my answer comes down to how important the data you're storing is. I consider my primary focus to be the maintenance of existing data, everything else is secondary - with that in mind I'd be tempted to go for a 20 x 2TB RAID 10 configuration if you have the budget, if not then the 10 x 2TB RAID 6 config and lastly the 20 x 1TB RAID 5. I'd also make sure I had a few spare drives lying around nearby and ensure I had my drive-failure alerting working fine too.

Regarding backup then for intermittent (i.e. non-24/7) use those green drives are fine.

Chopper3
  • 101,299
  • 9
  • 108
  • 239
0

Try to stay away from RAID-5 and big (1TB+) disks. I got myself in that trap, and I'm not going to wait for a drive failure before I redesign my raid.

Like Chopper said, if you have the budget: Run 20x1TB RAID10 or with dual-controllers; 2x10x1TB. It will give you much better write speed, allthough only 10TB of RAW storage.

If you really need 20TB of RAW storage then I'd run 2x10x2TB in RAID10.

pauska
  • 19,620
  • 5
  • 57
  • 75
0

Twenty disks in one raid 5 or 6 is just asking for trouble. The failure chance on a rebuild is way too high. I would not go beyond eight drives in a Raid 6. Take the size of the raid, divide by the bit failure rate of the disks, and do the math.

To know more about Raid 5 / 6 look at http://download.intel.com/design/storage/papers/30812202.pdf (page 8) or similar papers.

The way to go beyond 10 TB or so is Raid 10. Yes, it is way less efficient spacewise, but it is superior in performance and reliability on rebuild, because only a mirroring needs to be rebuilt.

Posipiet
  • 186
  • 1
  • We have 25 x 300GB 2.5 inch 10krpm SAS drive arrays (actually HP MSA 70's) that are 23+2 R6 arrays that we use for 99.99% reads and they're actually very reliable but for most scenarios you're right. – Chopper3 Mar 12 '10 at 10:29
  • Did you have a successful rebuild on that raid? – Posipiet Mar 16 '10 at 12:17
0

Looking at it from the engineering side, assuming they are constructed the same, there is a higher probability of the larger capacity drives giving trouble simply because everything is just that much more critical. Anything that moves must result in wear. The more critical everything is the more effect that wear will have. Plus, the tracks are getting so narrow on high capacity drives that a fault or blemish of just a few molecules on the magnetic coating can result in bad blocks.

When there's a choice, and all else being equal, I will always opt for lower capacity drives.I'm still using drives from more than a decade back but I'm pretty confident that none of today's higher capacity drives will still be functioning in 10 years.

John Gardeniers
  • 27,458
  • 12
  • 55
  • 109
0

ok, thanks for the answers. I guess I should give a little more details. I work in a research lab. I have a budget of about $13K to spend on this server. We are doing sequencing experiments that will generate ~50GB of data per run and some microscopy experiments generating between 25-50GB of data per run. I'm not entirely sure how many of these experiments will be done and for how long. I initially got a quote for a 16TB array. In that were 16 1TB SAS drives (Seagate ES.2 SAS, 7200 RPM). I've read nothing to suggest that 7200 RPM SAS drives are better than the RE3 drives, so since they are about $60 per drive cheaper that gives me a lot more money to upgrade the CPU and RAM specs. But if I go back to the 16TB solution (which is probably more than enough for some time) and choose 16TB array with RE3 drives in RAID10 (32 drives) and use 2TB drives in RAID6 for the backup that is still almost $1K cheaper than the 2 x 16TB SAS (7200RPM) subsystems, which is money I can use to get faster CPUs. I should also point out that a neighboring lab has a 12TB and a 32TB array completely made of Hitachi 7K1000 (1TB) drives and they have not had any problems with this configuration. Since we aren't "enterprise" class organizations I'm not sure why SAS would be any better.

To answer Chopper's 1st question: the quote I'm looking at says "LSI Dual Port 8880M2 Controller with BBU installed". This would control two identical subsytems (per the quote). Though what I'm describing is two separate but nearly identical sized arrays (would that be a problem?) Another option is a master node with a subsytem (no CPU) for backup. That has a 3Ware 96904i4e controller with BBU installed.

For Chopper's 2nd question, I don't know which OS we will be using yet. Are you saying Windows server 2008 can't handle arrays larger than 16TB? Or just "unsupported" as in no official MS support? I think the vendor uses CentOS for the linux, which I can't stand. We have CentOS on our microscope computer and it's been a pain. I need to first make sure which platform all our applications run on, so far Matlab, R, and the aligners I use run on Linux, so no prob there. I could always install Ubuntu (or Mint, my fave, on it instead).

There are so many configuration options it's enough to make my head spin! Thanks for all your help so far. I hope I can contribute back. This is a helpful website.

captainentropy
  • 111
  • 1
  • 2
  • 6
  • ok, so I asked about a build of a 16TB RAID10 system using RE3 drives and was told upgrading the case and controller to handle 32 1TB drives would be too expensive. New quote is 16TB using RE4 drives (2TB) in RAID10 for the master system and the 2TB green drives for the backup. Erik suggests not going over 1TB drives and it seems like most of you would rather use RAID10 over RAID5/6 -makes sense to me. Faster and more fault tolerant, right? But since this isn't an enterprise environment should I really be concerned about using 2TB RE4 drives? They are designed to be enterprise-class after all. – captainentropy Mar 16 '10 at 03:15
  • Hi. 2TB drives don't have more faults than 1TB drives, it's just that rebuilding an array after a 2TB disk failure takes twice the amount of time that a array with 1 TB disks would. Remember that performance suffers (alot) when doing RAID rebuild. – pauska Apr 21 '10 at 09:44
0

agree with posipiet. too much disks will very much increase the chance of failure. thus defeat the benefits of raid.

also, do consider raidz2

alanc
  • 1,500
  • 9
  • 12
DennyHalim.com
  • 491
  • 3
  • 10