0

Nowadays my storage is 6TB and as I will grow up to 30TB in few months I would like to hear some tips/recommendations on filesystem and element strip size to not to have problems in future. 90% of files are 700MB-4GB (mainly large video files and archives)

Now I am using ext4 and 64KB strip size. Should I increase strip size to 128KB/256KB ? Would be zfs or xfs better than ext4? Actual usage is 85% read and 15% write. In the future when the enclosure is full, read will be 100% and I would like to have best through put rate.

Wiggler Jtag
  • 251
  • 5
  • 17

1 Answers1

2

Try that: do not use Raid 5 on anything 2gb or larger in drives ;) For 30tb I would even go with Raid 6 mirrored (i.e. 2 copies in software raid) to make sure I keep the data in case of corruption.

Now I am using ext4 and 64KB strip size. Should I increase strip size to 128KB/256KB ? >

Hard or software? Generally yes - it is a lot less work to read more data than to come back later. Not a Linux guy here - but SQL Server for example does read 64kb extends but tries to keep table data in linear blocks so IO is reduced. A good large file system will try the same, which means a larger than 64gb IO segment size is good.

I remember analysis of enterprise level Raid controllers that showed an increase in throughput at 512kb / 256kg compared to smaller sizes. Especially if you ahve enough caching to make it "stick" on the Raid controller level.

A lot also depends on read. LArge archives and filesa re mostly linear non random access. That will fly. I have a smaller system but we do redundant reading from nearly 200 processes on it on a larger number of machines, the machines with 1gb, the storage with 10 - so it is HEAVILY random IO coming in and I use a Raid 6 now of 8 velociraptors. THat is half a gigabyte per second delivered. 256kb Stripe, Raid 6, 1gb cache on an Adaptec 71605Q. SSD as cache available but not active for that group ;)

A lot depends on read patterns.

But stay away from Raid 5 for those large drives. That is gambling the data - unless you can live without the Raid (during a full rebuild when the raid blows during a rebuild due to a drive failure) and have another backup source (like tapes). YOu can basically expect a problem with that many 4tb drives, mathematically.

TomTom
  • 51,649
  • 7
  • 54
  • 136
  • I've changed 4TB to 6TB, I mean 3x 2TB HDD. Sorry for that, but thanks for your post, I will stick with RAID 6. – Wiggler Jtag Jan 28 '14 at 16:50
  • 1
    How you plan handling backups? Note that backups are also for - user errors. customer of ours managed to wipe a group workspace during end of year cleanup - and had refused backusp. That was a TON of money lost in reconstructing data (still going on). – TomTom Jan 28 '14 at 16:53
  • Data are not that important, so I do not plan to create backups. I will go happily with RAID 6. Anyway just a curious question, if one disk totally fails, how does RAID 6 stand to restore that disk? How much time is it needed to restore 2TB disk? Why is better to have 2 parity disks instead of just one (as RAID5)? Are those 2 parity disks in RAID 6 totally the same? Just to have a copy if one of them fails? Or do they hold different informations? – Wiggler Jtag Jan 28 '14 at 17:54
  • They are not the same - but basically Raid 6 can rebuild the data from the rest. How long it takes depends on the disc, the raid busystem and configuration and load (if there is a lot of io, the rebuild may take longer). Not sure whether the parity discs are absolutely identical - I do not think. After all, the 2 failing discs may be non-parity, so I think it is just other maths. For an admin that is an implementation detail, though - I never programmed raid so... no idea. – TomTom Jan 28 '14 at 18:06