1

I have 20X sata 12TB disks in a file server with a Broadcom/LSI MegaRAID SAS39xx raid controller. It’s running ubuntu. For an performance experiment, I created two storage pools:

-A single hardware RAID6 array of 10 disks = 8 data disks+ 2 parity disks (effectively). I then pooled and mounted this using ZFS. This pool was named hardA.

-10 single disk “raid”s and then created a ZFS raidZ2 pool using them. This pool was named softB.

Surprisingly (to me anyway) softB comes out ~5TB smaller than hardA (83TB instead of 87TB, my maths says that 87TB is the expected value). My logic says that all the overheads for these 2 solutions should either be the same, or very close. Could anyone please shed a light on where the discrepancy is, and if there is anything I can do to fix it.

(as an aside I found in the limited testing I’ve done so far the performance of the softB was over double hardA in pretty much every test I ran in FIO, but that’s not the point. Work in progress. And yes, ashift autodetected to 12)

Random diagnostics follow:

# zfs list
NAME         USED  AVAIL     REFER  MOUNTPOINT
hardA       15.9M  87.2T       96K  /hardA
hardA/data    96K  87.2T       96K  /store/hardA
softB       24.2M  83.0T      219K  /softB
softB/data   219K  83.0T      219K  /store/softB
# zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
hardA  87.3T  16.1M  87.3T        -         -     0%     0%  1.00x    ONLINE  -
softB   109T  31.8M   109T        -         -     0%     0%  1.00x    ONLINE  -
# zpool status
  pool: hardA
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        hardA       ONLINE       0     0     0
          sdc       ONLINE       0     0     0

errors: No known data errors

  pool: softB
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        softB       ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdg     ONLINE       0     0     0
            sdh     ONLINE       0     0     0
            sdi     ONLINE       0     0     0
            sdj     ONLINE       0     0     0
            sdk     ONLINE       0     0     0
            sdl     ONLINE       0     0     0
            sdm     ONLINE       0     0     0

errors: No known data errors
# zpool iostat -v
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
hardA       16.1M  87.3T      8     67  1.09M  51.6M
  sdc       16.1M  87.3T      8     67  1.09M  51.6M
----------  -----  -----  -----  -----  -----  -----
softB       31.8M   109T     54    477   867K  70.0M
  raidz2-0  31.8M   109T     54    477   867K  70.0M
    sdd         -      -      5     49  86.8K  7.00M
    sde         -      -      5     46  86.7K  7.00M
    sdf         -      -      5     51  86.7K  7.00M
    sdg         -      -      5     46  86.6K  7.00M
    sdh         -      -      5     48  86.5K  7.00M
    sdi         -      -      5     47  86.6K  7.00M
    sdj         -      -      5     48  86.8K  7.00M
    sdk         -      -      5     47  86.9K  7.00M
    sdl         -      -      5     47  86.6K  7.01M
    sdm         -      -      5     45  86.7K  7.01M
----------  -----  -----  -----  -----  -----  -----
# df -h
Filesystem                         Size  Used Avail Use% Mounted on
--snip--
hardA                               88T  128K   88T   1% /hardA
softB                               83T  256K   83T   1% /softB
hardA/data                          88T  128K   88T   1% /store/hardA
softB/data                          83T  256K   83T   1% /store/softB
--snip--
#zfs version
zfs-2.1.4-0ubuntu0.1
zfs-kmod-2.1.2-1ubuntu3```
Thingomy
  • 33
  • 7

1 Answers1

2

RAIDZ2 isn't the same thing as RAID 6... Similar, but not the same. There are different overheads to consider.

For ZFS, spa_slop_shift reserves 3.2% of the pool space for pool operations and as a cushion to prevent out-of-space conditions. This can be adjusted, and should be considered for large zpools.

There are a lot more mechanics involved in predicting ZFS usable capacity. See: https://wintelguy.com/2017/zfs-storage-overhead.html

In many cases, the number of devices in each vdev, as well as the number of vdevs can impact the results.

ewwhite
  • 197,159
  • 92
  • 443
  • 809
  • You may be onto something here. But given that both volumes are ultimately zpools, I'd have expected the 3.2% to affect both. Is it somehow not a thing for single disk pools, but is for raidZ2? and anyway, 87.2TB-3.2% ends up being 84.4TB rather then the 83TB reported. there's still 1.4TB missing. – Thingomy Jul 23 '22 at 10:20
  • You're starting with an assumption that these capacities should be equal. Why is that? [ZFS space calculations](https://wintelguy.com/2017/zfs-storage-overhead.html) are often dependent on the number, composition and layout of the vdevs. The number of drives DOES matter. – ewwhite Jul 23 '22 at 12:04
  • I prefer the term hypothesis, and I’m here to have it challenged. I’m looking to gain an understanding of what’s going on in enough detail to make a decision on which configuration is best for final deployment, as well as a deeper understanding in general. Your link is an excellent resource, thank you, it’s a great starting point, but reading through it, the only factor in there that would appear to affect more than ~1% of space is the slop space reservation, and (if I’m reading it right) this should affect both pools equally. I’m seeing 4.8%. – Thingomy Jul 24 '22 at 11:05
  • The business-minded and professional answer is that the space discrepancy shouldn't matter at this scale, and that it's prudent to plan for growth and expansion. If you're filling this array up to the point where 5TB of space is a concern, you should provision more storage space with greater headroom. – ewwhite Jul 24 '22 at 21:57