Why does ZFS RAIDZ2 only use 2GB of data when I create a 1GB File

Question

I have created a ZFS RAIDZ2 / Raid 6 file system, which from what I believe will store parity on 2 disks.

root@zfs-demo:/data# zpool status
  pool: data
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        data        ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0

errors: No known data errors

I have a 1GB file

root@zfs-demo:/data# ls -alh
total 1023M
drwxr-xr-x  2 root root    3 Dec 17 18:22 .
drwxr-xr-x 19 root root 4.0K Dec 17 18:10 ..
-rw-r--r--  1 root root 1.0G Dec 17 18:22 1GB.bin

I thought the two disks of parity would mean I was storing the file itself + two lots parity = 3 GB of storage in total for a 1 GB File, but only 2GB is allocated.

root@zfs-demo:/data# zpool list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
data  39.5G  2.01G  37.5G        -         -     0%     5%  1.00x    ONLINE  -

For four disks, you should probably be using RAIDZ1 or ZFS mirrors. RAIDZ2 doesn't offer much benefit for that small number of disks. — ewwhite, Dec 17 '22 at 20:38
@ewwhite Thank you, this is just a lab environment for me to learn more about ZFS. I will be blowing it all away once I have answered all my questions one of which is this question. — PrestonDocks, Dec 17 '22 at 20:46
Simple reasoning without the need for any technical knowledge: You have 4 discs a 10 TB, and 20 TB usable and 20 TB lost for parity in RAID-Z2. So you need a 1 to 1 proportion of data to parity or you could not get the disc full. If your proposed example of 1 GB should use 2 GB of parity space, the parity space would be full after 10 TB written, but you have 20 TB usable. — Sunzi, Dec 18 '22 at 13:42

Zac67 · Accepted Answer · 2022-12-18T12:29:16.167

8

With two out of four disks for redundancy, you can simply double the user data: two disks store the original data and the same space is used for redundancy data on the two other disks. Parity is actually distributed across all disks using striping, but that doesn't change the space that's taken up.

With the amount of disks you could use RAID 1/mirroring with the same space efficiency but better throughput efficiency (and less resilience, as Romeo Ninov has commented). RAID-Z2 or RAID 6 become more efficient with more disks: with a total of ten disks, eight can effectively be used for data and still only two are for redundancy.

edited Dec 18 '22 at 12:29

answered Dec 17 '22 at 23:03

Zac67

10,320
2
12
32

2

If by mirror you mean RAID10 (we talk about 4 disks) you will be in trouble if two disks fail and they are part of one submirror. But RAID6 will survive 2 missing disks w/o any problem. – Romeo Ninov Dec 18 '22 at 09:16
@RomeoNinov Yes, Z2/RAID6 are more resilient than RAID1/10 but the latter are almost always faster, also with rebuilding. – Zac67 Dec 18 '22 at 10:45
2

While efficiency for parity RAID improves with more disks, effective resilience is inversely proportionate to the code rate (IOW, as the ratio of data disks to total disks approaches unity, the probability of losing enough disks at once to lose the whole array also approaches unity). This is part of why RAID-Z3 exists, and also why the norm for very large ZFS pools is to use multiple smaller arrays instead of one very big array. Most sensible admins would not consider using ten disks for a single RAID6 array instead of just doing two five-disk RAID5 arrays (or even two five-disk RAID6 arrays). – Austin Hemmelgarn Dec 18 '22 at 15:07
@AustinHemmelgarn, getting in consideration MTBF and size of contemporary disks time to recovery of array (based on single disk fail) start to become longer than MTBF (very simplified) so RAID5 become highly unrecommendable . And you are right, this is the reason of existence of RAIDZ3 – Romeo Ninov Dec 18 '22 at 15:28
2

@AustinHemmelgarn Similar to RAID10 vs RAID6 from Romeo's comment, RAID50 is not as resilient as RAID6 with the same amount of disks. – Zac67 Dec 18 '22 at 15:40
And to adding to Zac67 comment: most of the time RAIDZ2+spare help increase the resilience :) – Romeo Ninov Dec 18 '22 at 15:59

Romeo Ninov · Answer 2 · 2022-12-17T20:58:34.827

4

The situation is (explained to get the idea, very simplified) this:

Let suppose ZFS use 512MB blocks. So you store on disk 1 512MB (part one of file), on disk 2 you store next 512MB, on parity 1 you store next block of 512MB (so you can restore the file only with disk1 and parity 1 for example), on parity 2 you store another 512MB so you can restore the file with disk 1 and parity2.

Here is what you need to be up and running to get entire file:

d1+d2
d1+p1
d1+p2
d2+p1
d2+p2
p1+p2

If you have for example you have 5 disks (RAIDZ2) and have block 333MB you will have such blocks on disk 1, 2, 3, parity 1 and 2. In sum 1666MB

edited Dec 17 '22 at 20:58

answered Dec 17 '22 at 20:53

Romeo Ninov

5,263
4
20
26

Thank you, but I'm not sure I understand your answer in relation to my question. Why a 1GB file is using only 2GB of storage space. If disk 1 stores the 1GB file and Disk 2 stores 1GB parity and disk 3 stores another 1GB Parity, should I not see total disk usage of 3GB? – PrestonDocks Dec 17 '22 at 21:00
3

@PrestonDocks, please read my answer. D1 store half of the file, D2 store another half. P1 store parity with size of half file, same for p2. – Romeo Ninov Dec 17 '22 at 21:01
@PrestonDocks, yes, this is how it is stored. And no, parity is calculated so array can restore the file based on original information and parities on p1 and p2. See in my answer the pairs, sufficient to reconstruct the file. – Romeo Ninov Dec 17 '22 at 21:11

Why does ZFS RAIDZ2 only use 2GB of data when I create a 1GB File

2 Answers2