1

When I run this query on a dataset

zfs list -d 1 -t all -o name,used,refer,written,compressratio sfg-backup/mx

I see the following stats:

zfs list -d 1 -t all -o name,used,refer,written,compressratio sfg-backup/mx
NAME                                           USED     REFER  WRITTEN  RATIO
sfg-backup/mx                                  300G      276G        0  1.80x
sfg-backup/mx@madcow_2023-04-15_23:15:00_UTC  4.04G      275G     275G  1.28x
...
sfg-backup/mx@madcow_2023-04-21_01:15:00_UTC     0B      276G        0  1.28x
sfg-backup/mx@madcow_2023-04-21_02:15:00_UTC     0B      276G    4.26G  1.28x
sfg-backup/mx@madcow_2023-04-21_03:15:00_UTC     0B      276G        0  1.28x

However, when I run a backup, that has the last snapshot as madcow_2023-04-21_01:15:00_UTC the size of the backup is not 4.26GB but 31.4GB

syncoid --no-sync-snap 10.0.1.2:sfg-backup/mx work/sfg/mx
NEWEST SNAPSHOT: madcow_2023-04-21_03:15:00_UTC
Sending incremental sfg-backup/mx@madcow_2023-04-21_01:15:00_UTC ... madcow_2023-04-21_03:15:00_UTC (~ 31.4 GB):
31.5GiB 0:03:16 [ 163MiB/s] [==================================================================================================>] 100%

adding -c for compression brings the size to 4.3G (these are slighly different snapshots, but with more of less the same content.

zfs send -nv -c -I sfg-backup/mx@madcow\_2023-04-24\_00:15:00\_UTC sfg-backup/mx@madcow\_2023-04-24\_03:15:00\_UTC
send from @madcow_2023-04-24_00:15:00_UTC to sfg-backup/mx@madcow_2023-04-24_01:15:00_UTC estimated size is 215M
send from @madcow_2023-04-24_01:15:00_UTC to sfg-backup/mx@madcow_2023-04-24_02:15:00_UTC estimated size is 4.09G
send from @madcow_2023-04-24_02:15:00_UTC to sfg-backup/mx@madcow_2023-04-24_03:15:00_UTC estimated size is 624B
total estimated size is 4.30G

# without -c flag:
zfs send -nv  -I sfg-backup/mx@madcow\_2023-04-24\_00:15:00\_UTC sfg-backup/mx@madcow\_2023-04-24\_03:15:00\_UTC
send from @madcow_2023-04-24_00:15:00_UTC to sfg-backup/mx@madcow_2023-04-24_01:15:00_UTC estimated size is 216M
send from @madcow_2023-04-24_01:15:00_UTC to sfg-backup/mx@madcow_2023-04-24_02:15:00_UTC estimated size is 31.3G
send from @madcow_2023-04-24_02:15:00_UTC to sfg-backup/mx@madcow_2023-04-24_03:15:00_UTC estimated size is 624B
total estimated size is 31.5G

Can you help me understand what can cause this large discrepancy in sizes? Why reported by ZFS compression is 1.28 and transfer compression is 31.5/4.3=7.3?

dimus
  • 317
  • 1
  • 3
  • 10

1 Answers1

2

WRITTEN shows compressed data actually written to the dataset/snapshot. Between sfg-backup/mx@madcow_2023-04-21_01:15:00_UTC and madcow_2023-04-21_03:15:00_UTC you wrote highly compressible data on top of previous incompressible data, without de-referencing the entire file.

I suppose you have some big file which can be randomly overwritten (ie: virtual disk image files, databases, etc), and it just happened that you wrote 32G of raw data which became 4G of compressed data.

zfs send -c sends the compressed records as they are, transferring only the compressed 4G delta. On the other hand, zfs send (without -c) uncompresses the on-disk data, expanding them to the full 32G size.

shodanshok
  • 47,711
  • 7
  • 111
  • 180
  • thank you for the reply @shodankshok. So how to explain the difference in compression reported by ZFS (1.28x) and the difference in sizes (7.3x) between compressed `send` and uncompressed `send`? – dimus Apr 27 '23 at 18:36
  • 1
    1.28x is the `compressratio` of the *entire referenced data (276G)*. 1.28x276=353, so there is ample margin to expand your 4.3G into >30G – shodanshok Apr 27 '23 at 20:52