ZFS: re-compress existing files after change in compression algorithm

Question

I have a pool that was created in 2011, using lzjb compression, and it wasn't until a couple of years later that an upgrade allowed me to set the compression to lz4. I estimate that at least 20% of the content (by space) on the array was created prior to 2013, which means it's still compressed using lzjb.

I can think of a couple of options to fix this and regain (some) space:

Back up and restore to a new pool. Not really practical, as I do not have sufficient redundant storage to hold the temporary copy. The restore would also require the pool to be offline for several hours.
Write a script to re-copy any file with a timestamp older than 2013. Potentially risky, especially if it chokes on spaces or other special characters and ends up mangling the original name.

Is there some way to get ZFS to re-compress any legacy blocks using the current compression algorithm? Kind of like a scrub, but healing the compression.

A related question: is there some way to see the usage of each type of compression algorithm? zdb just shows overall compression stats, rather than breaking them down into individual algorithms.

I'm pretty sure you named the only two options. See also the discussion in [issue 3013](https://github.com/zfsonlinux/zfs/issues/3013) for why this functionality doesn't exist and you might not want to do this at all. — Michael Hampton, Oct 01 '18 at 02:14
lz4 is supposedly _at most_ 10% better on compressing than lzjb. If 20% of your data can be compressed 10% better you'll get at most 2% more free space. Is it worth it? — pipe, Oct 01 '18 at 12:01
If you write a shell script to do the copy, add `export LC_ALL=C` to the beginning of the script, and all non-ASCII special characters in filenames will be kept intact. Keeping whitespace and dash intact is trickier, use double quotes and `--`, e.g. `cp -- "$SOURCE" "$TARGET"`. — pts, Oct 01 '18 at 12:57
@pipe Space is one (very) small advantage, but I'm more interested in decompression speed. From the FreeBSD zpool-features manpage: "Typically, lz4 compression is approximately 50% faster on compressible data and 200% faster on incompressible data than lzjb. It is also approximately 80% faster on decompression, while giving approximately 10% better compression ratio." — rowan194, Oct 01 '18 at 13:13
@pts I wouldn't call obeying fundamental shell programming rules (double quotes around variables or using `--`) "trickier". That's as important as avoiding SQL injection, for example. — glglgl, Oct 01 '18 at 14:52

score 16 · Answer 1 · answered Oct 01 '18 at 02:28

16

You've have to recopy the data (full or partial) or zfs send/receive the data to a new pool or ZFS filesystem.

There aren't any other options.

answered Oct 01 '18 at 02:28

ewwhite

197,159
92
443
809

score 0 · Answer 2 · answered Aug 11 '23 at 15:30

Due to the way ZFS is designed, re-writing the data is the only way to go, usually by copying into a temporary file and renaming it over the original.

There's gary17/zfs-recompress (I'm not affiliated), a shell script, that does precisely this job.

I found that script very satisfactory in terms of results, but single-thread processing was slow for me, so I re-wrote it in Python for better performance and (possibly) better safety with regards to file names. My Python script is located at iBug/zfs-recompress.py which you might want to give it a try.

Make sure to back up your data (or take a zfs snapshot) before trying these tools.

ZFS: re-compress existing files after change in compression algorithm

2 Answers2