0

My question is regarding the file system performance of EXT4 volumes relative to size. We have a NAS running any arbitrary Linux platform. The NAS has (12) 4TB disks in a hardware RAID 6 with LVM resulting in approximately 40 TB of usable storage. The NAS is a file server (netatalk and samba). As the storage becomes exhausted we will attach SAS passive expansion units.

My questions:

  1. For optimal performance, are we best suited by beginning with a single 40 TB monolithic DATA volume and growing it? or are we will we be better off creating many smaller DATA volumes?
  2. How much is the answer to question 1 dictated by filesystem type, usage, etc.
  3. Are there any best practices or rules for calculating optimal volume size?
sardean
  • 833
  • 3
  • 15
  • 34
  • Will you be creating a single share or multiple (e.g. one for user home directories and additional ones for departments and projects) and are disk quota going to be assigned? Because traditional Linux quota is assigned per file system and multiple LVM volumes = multiple file systems = multiple quota levels. AFAIK as the exception only xfs supports directory level quota. – HBruijn Jan 24 '15 at 05:54
  • @hbruijn good point, these volume(s) will not be used for serving home folders. – sardean Jan 24 '15 at 15:11

1 Answers1

4

The paranoid in me, scarred by years of experience and not trusting of recent improvements, does not like the idea of a single large EXT4 volume for what sounds like unstructured file-serving. EXT4 is probably good enough for the task in recent kernels, but it still smells like an EXT filesystem. The failure-modes with those are not good and... I don't trust 'em.

For years I'd go for if at all possible, as it is designed for huge scale, and recent improvements have fixed up a lot of the inode performance problems it was known for in the past (apparently I trust improvements in filesystems I like). And it doesn't fsck, or even claim to need it. Which is nice, since fscking a 40TB EXT4 volume would take a very long time, and that's time that counts against the downtime budget.

or are the way of the future, though. Support isn't fully enterprisy (or licensed) in the kernels just yet, though btrfs is really close to earning that accolade. If you're willing to deal with possibly not-yet-bulletproof newness, the bullet-proofing should come in future kernel updates and you won't have to deal with changing your filesystem. Also, with these the concept of 'multiple volumes' doesn't quite apply, as the FS itself creates sub-volumes.


Optimal Volume Sizes

For 'unstructured file-serving' your scaling factors are likely to be how big your directories get. Once Upon a Time, 64K files/sub-directories in a directory was very bad on EXT filesystems, but this has been fixed. At the same time, the number of inodes in a system could provide big scaling issues (8 million files, didn't work so well on some filesystems).

Most of these have been engineered around by now. Even ext4, which I don't particularly like, can deal with 10's of millions of files. Will it be fast? Eh, depends on what it is you're saving.

Backup/Restore Considerations

The thing no one really thinks about. What are you doing for that? Are you relying just on LVM snapshots and RAID, exporting full copies somehow (tar to tape), or doing periodic syncs to a remote system via rsync or something? What you're doing will impact your filesystem choice.

XFS gives you xfsdump which is a very good utility for backing up XFS filesystems. This is better and faster than tar since it stores fs-structures in the archive directly, where tar has to build posix abstractions that slows it down.

On magnetic media, extent-based filesystems do a bit better on backup as they're better at avoiding fragmentation. Xfsdump bypasses some of the frag problem due to it's native tooling. Generally, inode performance will be an issue for touch-all-the-files styles of backup (tar and rsync).

sysadmin1138
  • 133,124
  • 18
  • 176
  • 300
  • Brilliant answer/explanation. Can I ask you to define "unstructured file serving"? I'd like to understand a bit more what you mean by this term. Also - to answer the backup part of the question - typically rsync for short term and longer term archiving to tape. – sardean Jan 24 '15 at 03:04
  • 'structured file-serving' is something like Sharepoint or another document management system. Unstructured is a big mass of excel, word, powerpoint, random images, and whatnot. – sysadmin1138 Jan 24 '15 at 03:19