I’m trying to understand why there is a large discrepancy with the used space of a production ZFS dataset and a backup dataset that is being populated with a nightly zfs send (I keep 30 daily snapshots and replicate nightly - no other systems write to or otherwise access the backup dataset). Compression and deduplication are not enabled on either side. The backup dataset is reporting 315T used while production is only using 311T (the two systems are essentially mirrored in terms of hardware). My issue is that the nightly zfs sends are now failing (out of space).
A follow up question is if there is an obvious way out of this issue? The backup pool shows 10.7T free, but that doesn’t seem to be available to the dataset as it only reports 567G free. If I was to destroy the backup pool and perform a full zfs send of the production data, would we expect it to complete? I've already destroyed all but the most recent two snapshots on the backup dataset, but it didn't free enough space to allow a new zfs send. I purposely set a quota of 312T on the production dataset to help keep users in check as they’ll often work near 100% full but it seems that quota may not have been enough? (there is no quota defined on the backup pool/dataset)
Production system:
# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
data 326T 311T 15.3T - 44% 95% 1.00x ONLINE -
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
data 311T 5.11T 96K /data
data/lab 311T 1.30T 306T /data/lab
Backup system:
# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
backup 326T 315T 10.7T - 6% 96% 1.00x ONLINE -
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
backup 315T 567G 96K /backup
backup/lab 315T 567G 315T /backup/lab