Explain lz4 double buffer example

Question

In lz4 examples, there is one named doublebuffer "https://github.com/Cyan4973/lz4/blob/master/examples/blockStreaming_doubleBuffer.c". This uses a char inpBuf[2][BLOCK_BYTES] during a read-compress loop and uses inpBuf[0][], inpBuf[1][] alternately.

I cannot understand the benefit of this. Why not use a single buffer? What am I missing?

The example itself only shows how to compress/decompress using two buffers. There is no reason mentioned -- nor necessary -- *in the example*. If (for whatever reason!) you happen to already use two buffers, you can use this style of coding. — Jongware, Feb 17 '15 at 14:58

score 2 · Accepted Answer · answered Feb 17 '15 at 18:19

The benefit of double buffer is better compression ratio. This is only useful if you don't have enough memory to fit your entire object/file into memory as a single block.

This is not obvious. So it deserves a comparison to check that.

You can make this exercise if you want to experience it more directly :

1) Compress a file, by cutting it into blocks of 4 KB, and compressing each block independently. Note the final compression ratio.

2) Compress the same file, but using a double-buffer with 2 blocks of 4 KB, applying the same methodology as the one provided into example. Note the final compression ratio, it should be greatly improved.

3) For a more suitable comparison, redo test 1, but using 8 KB independant blocks this time, so that both implementation 2 & 3 use the same amount of memory. You should, once again, notice that implementation 2 offers better compression ratio.

4) The ratio difference is even more pronounced if using the "HC" version of LZ4, rather than the "fast" one.

So, to summarize :

If you have enough memory to contain your whole object/file into memory, you don't need to use this method
If you have to cut your input data into smaller blocks, you can select to experience a better compression ratio by using double-buffer, rather than independent blocks. Downside is that it is more complex to setup.

thanks for the reply! I don't really get why 2 buffers of 4K can produce better compression than one 8K buffer. The implementation seems to fill up a buffer on each loop and just pass this one buffer to LZ4_compress. How does LZ4_compress care if we also have another 4K buffer which is unused in the current loop? — sivann, Feb 21 '15 at 14:28
Previous buffers are being tracked by LZ4 automatically within its context structure. LZ4 just "remembers" them. That's why LZ4 requires that previous data remained "stable" (unmodified) up to 64KB, the only exception being small ring buffers, which are also automatically tracked. — Cyan, Feb 24 '15 at 13:16

Explain lz4 double buffer example

1 Answers1