0

We are operating a server for doing computations with large data sets (100GB - 5TB) for a small group of researchers (10-20 people). At the moment we are running 4x8TB spinning drives in a RAID 6 using mdadm and Btrfs on top of that (no Btrfs RAID). Note that 4 disks in a RAID 6 does not make much sense, we planned to add more disks from the very beginging. Especially when multiple jobs are running at the same time (multiple users working on the machine), we are experiencing very slow random I/O reads and writes. For example IPython can take up to 10 seconds to load (IPython reads tons of config files and imports quite some Python modules). We couldn't figure out yet, what is the root cause of this issue. Writing/reading a single huge file to disk is fast. We suspect the hardware (did we chose wrong disks?), the RAID or some kind of misconfiguration. Unfortunately there is not a lot of budget for really expensive hardware.

Since we want to re-setup the system, I want to ask the question whether RAID 6 was a good choice for our use case in general.

Question:

Is RAID 6 a good choice for running a computation server?

lumbric
  • 234
  • 2
  • 9
  • "we are experiencing very slow random I/O reads and writes." - because 8tb discs are dead slow. You basically said goodbye to performance when you got slow discs. Not sure how you could NOT figure out the root cause, which is comically low IO budget per disc. – TomTom Oct 14 '20 at 14:41

1 Answers1

1

Is RAID 6 a good choice for running a computation server?

'It depends' - stock answer sorry but it depends on your disk IO needs, there's no 'one size fits all'.

What I will say is that R6/60 is a generally fine option that may well suit your needs, the only other option really being R1/10, which is faster but obviously uses more disk space, though I know a lot of people really like ZFS if that's appropriate for your setup. What you should avoid entirely is R5/50, it's essentially dead and has been for years, it's dangerous basically.

Chopper3
  • 101,299
  • 9
  • 108
  • 239
  • R5/50 dangerous because only 1 disc can fail? We've seen even ssh logins to take up to many seconds, that shouldn't be the case with RAID 6 even if disk cache is empty, right? Otherwise I guess the only valid use case for RAID 6 would be backup servers, where latency does not matter. – lumbric Oct 14 '20 at 14:52
  • 3
    @lumbric RAID 5 is effectively dead because the risk of a second drive failure during the rebuild time has become too high with the advent of terabyte disks. – Michael Hampton Oct 14 '20 at 16:43