0

The machine will store and serve millions of small files (<15Kb each), and all those files require a total storage space of 400G Considering the exact same SATA hard drives maker and models, on the exact same environment (OS, cpu, ram, raid controller, etc...)

which one of the setups bellow would be faster?

  1. RAID 1 with 2 drives of 2T each, making up total storage of 2T
  2. RAID 10 with 4 drives of 2T each, making up total storage of 4T

[EDIT]: I'm aware RAID10 is faster than RAID1. The larger the disk, at least in theory, the longer will take to do seeks/writes. So, will the performance gain of RAID10 will be outweighed by the "drag" caused the larger disk area when seek/write operations happened?

Stefan Lasiewski
  • 23,667
  • 41
  • 132
  • 186
  • It's impossible to give you a correct answer with the info you've given as it largely depends upon the hardware you have and how the files are being accessed (predominantly read only or read/write?) – Bryan Apr 04 '12 at 09:41
  • Like Bryan said, it's impossible to tell you how fast it'll be, cause we don't know how your application performs. The only sure thing is that millions of 15kB files on S-ATA disks are NOT going to be fast, as the IOPS is going to limit you - especially if the writes are random. – pauska Apr 04 '12 at 10:49
  • Responding to the Edit: Areal density generally does not slow drives down. The "size" difference is literally the physical size. 2.5" drives have shorter seek times than 3.5" drives; the difference is quite noticeable. – Chris S Apr 04 '12 at 13:05
  • Can I add the option of using SSD, please? Cost for 512 GB is not that high anymore with SSD. Your setup will be simple too. – imel96 May 21 '13 at 02:12

3 Answers3

7

RAID10 with 4 drives will be quicker -- the extra spindles mean that twice as many IOPS can be handled (more or less).

womble
  • 96,255
  • 29
  • 175
  • 230
  • Surely RAID10 is faster than RAID1. But what about the extra disk area that raid10 will have to seek/write? Won't it limit the performance gain? –  Apr 04 '12 at 10:39
  • 4
    I think that the OP isn't visualizing at how head movement works on an array quite the right way. There isn't any "extra disk area" to be swept by a single arm. Everyone knows that a loss in drive performance is caused by movement of the drive head/arm mechanism. Moving the head around, particularly if it is jumping back and forth, kills performance. It's easy to visualize this for a single drive. When considering multi-drive arrays, the swept area does not increase on a per-drive basis. You will have four independantly swept areas with four head/arm mechanisms, all working simultaneously. – Darin Strait Apr 04 '12 at 12:09
1

Every spindle in a RAID contributes the same number of IO/s it can perform, as well as the same number of MB/s it can perform on sequential IO. The size of the drive has very little to do with these numbers; the only thing that matters is spindle speed.

Since you're controlling for drive characteristics, any 4 drive RAID-10 will always be able to read and write at exactly twice the speed (both average IO/s on random IO, and MB/s on sequential IO) as a 2 drive RAID-1.

Since you're comparing RAID-1 with RAID-10, the ratio of read to write performance is the same, however if you were comparing RAID-0 with RAID-10, the writes would not be twice as fast despite the higher spindle count. This is because a RAID-10 has to perform two physical writes for every one write the server does. If you were to instead compare it to a 4 drive RAID-5, you could have a theoretical advantage on writes to the RAID-5 as it only has to do one extra write for each three writes you do (ignoring parity calculation).

Basil
  • 8,851
  • 3
  • 38
  • 73
1

One thing to keep in mind here is that the average seek time reported by the drive is just that: an average. It's the average amount of time takes to perform a given seek, assuming that the seek could send the head anywhere on the drive.

In this case, you're spreading 400GB over 2TB drives. That means you'll have a lot of empty space on the drives. Thanks to smart storage controllers, as long as your data doesn't get fragmented (and it shouldn't) everything will be stored at the front of the disk. This will keep your seek times below that average for each disk, because you're only going to perform seeks to locations near the rest position of the read/write head.

This is relevant to the question, because when you go from 2 mirrored disks to four, you suddenly spread that data out a little thinner and only have 200GB stored on each disk. This means even faster actual seek times. It also means twice the heads doing the seeking. You're cutting your seek latency in 1/4.

In other words, the RAID 10 option will be a huge win.

Joel Coel
  • 12,932
  • 14
  • 62
  • 100