2

I recently created a ZFS volume to test its compression capabilities. I'm comparing it side by side to an ext4 volume. After creating the new volume and turning compression on with sudo zfs set compression=gzip postgres-zfs I copied a ~3GB file from the ext4 volume to the ZFS file but the file is the same exact size on the ZFS drive (I used ls -alh to see this). I gzipped the file manually to see what the compression should be (I understand there are different levels but just to get a ballpark) and just using gzip file the file size was cut in half. My ZFS settings also show compression is turned on:

# zfs get all
NAME          PROPERTY              VALUE                  SOURCE
postgres-zfs  type                  filesystem             -
postgres-zfs  creation              Thu Apr  5 17:17 2018  -
postgres-zfs  used                  1.54G                  -
postgres-zfs  available             143G                   -
postgres-zfs  referenced            1.54G                  -
postgres-zfs  compressratio         1.34x                  -
postgres-zfs  mounted               yes                    -
postgres-zfs  quota                 none                   default
postgres-zfs  reservation           none                   default
postgres-zfs  recordsize            128K                   default
postgres-zfs  mountpoint            /postgres-zfs          default
postgres-zfs  sharenfs              off                    default
postgres-zfs  checksum              on                     default
postgres-zfs  compression           gzip                   local
postgres-zfs  atime                 on                     default
postgres-zfs  devices               on                     default
postgres-zfs  exec                  on                     default
postgres-zfs  setuid                on                     default
postgres-zfs  readonly              off                    default
postgres-zfs  zoned                 off                    default
postgres-zfs  snapdir               hidden                 default
postgres-zfs  aclinherit            restricted             default
postgres-zfs  canmount              on                     default
postgres-zfs  xattr                 on                     default
postgres-zfs  copies                1                      default
postgres-zfs  version               5                      -
postgres-zfs  utf8only              off                    -
postgres-zfs  normalization         none                   -
postgres-zfs  casesensitivity       sensitive              -
postgres-zfs  vscan                 off                    default
postgres-zfs  nbmand                off                    default
postgres-zfs  sharesmb              off                    default
postgres-zfs  refquota              none                   default
postgres-zfs  refreservation        none                   default
postgres-zfs  primarycache          all                    default
postgres-zfs  secondarycache        all                    default
postgres-zfs  usedbysnapshots       0                      -
postgres-zfs  usedbydataset         1.54G                  -
postgres-zfs  usedbychildren        132K                   -
postgres-zfs  usedbyrefreservation  0                      -
postgres-zfs  logbias               latency                default
postgres-zfs  dedup                 off                    default
postgres-zfs  mlslabel              none                   default
postgres-zfs  sync                  standard               default
postgres-zfs  refcompressratio      1.34x                  -
postgres-zfs  written               1.54G                  -
postgres-zfs  logicalused           2.07G                  -
postgres-zfs  logicalreferenced     2.07G                  -
postgres-zfs  filesystem_limit      none                   default
postgres-zfs  snapshot_limit        none                   default
postgres-zfs  filesystem_count      none                   default
postgres-zfs  snapshot_count        none                   default
postgres-zfs  snapdev               hidden                 default
postgres-zfs  acltype               off                    default
postgres-zfs  context               none                   default
postgres-zfs  fscontext             none                   default
postgres-zfs  defcontext            none                   default
postgres-zfs  rootcontext           none                   default
postgres-zfs  relatime              on                     temporary
postgres-zfs  redundant_metadata    all                    default
postgres-zfs  overlay               off                    default

Any idea why this data is not being stored compressed?

Tony
  • 1,281
  • 4
  • 17
  • 23
  • Want to see if your data is compressed? Try `dd if=/dev/zero of=test.dat bs=1M` and test how fast it writes. Then run `du -h test.dat`. – Andrew Henle Apr 05 '18 at 23:15
  • By the way, please use lz4 compression instead of gzip for this dataset. Gzip will kill your performance. – ewwhite Apr 06 '18 at 12:02
  • @ewwhite i'll do some benchmarks. citus measured faster performance with gzip - https://www.citusdata.com/blog/2013/04/30/zfs-compression/ – Tony Apr 07 '18 at 17:17
  • I promise, unless you are offloading gzip to a hardware accelerator somehow, LZ4 will be way, way faster. – Dan Nov 06 '18 at 21:47

1 Answers1

9

The data is compressed, just the OS itself can't recognize the compression through normal commands, as the files are transparently decompressed when you access them.

In that list of the ZFS settings you see an entry called compressratio, which in your case reads x1.34. This shows how efficiently the files were compressed (on average):
compressed size * compressratio = uncompressed size

You can also see used and logicalused, which display the absolute compressed size and absolute uncompressed size of the complete pool (although logicalused doesn't seem to match up with the mentioned filesize of the test file).

You can find more information about those values here

I also put together a short list containing all the commands and what they output:

Tim Schumacher
  • 576
  • 3
  • 12
  • So if I have a 150GB volume, I can throw a ~200GB file on there as in `ls -alh` will read that the file is ~200GB but it will not blow up / actually be able to store it because I know in my head the 200GB I'm reading is actually the uncompressed size? – Tony Apr 05 '18 at 22:29
  • Yes, ls will show the full 200GB. – Tim Schumacher Apr 05 '18 at 22:31
  • wait...but wouldn't the compression ratio be different for different types of files? how can there be a guaranteed compression ratio for every file? – Tony Apr 05 '18 at 23:05
  • 1
    oh i think it's the average compression ratio for all files. is there a way to see the compressed size of just one file? – Tony Apr 05 '18 at 23:07
  • 1
    `du -h` does output the compressed size of whatever you give it. `du -h --apparent-size` shows the uncompressed size. I guess I'll put up a table with all the different commands shortly. – Tim Schumacher Apr 06 '18 at 11:15