7

I wonder how DEFLATE determines block size(I mean, in zlib.)

In RFC 1591, following explanation exists: "The compressor terminates a block when it determines that starting a new block with fresh trees would be useful, or when the block size fills up the compressor's block buffer."

It is not enough for me. I want to know what condition is needed to end current block and start new block in detail.

How does DEFLATE decide whether fresh tress would be useful or not? What is size of compressor's block buffer?

S.Lee
  • 73
  • 1
  • 5

1 Answers1

10

zlib's deflate ends the block when either the current symbol buffer fills up (by default 16,383 symbols), or the input data is complete (Z_FINISH was requested). deflate in zlib does not try to judge when it might be beneficial to end a block earlier.

One symbol in this case is either one literal, or one match of any length.

The size of the symbol buffer is determined by the memLevel parameter of deflateInit2(). A memLevel of 8, which is the default used by deflateInit(), results in 16,383 symbols. memLevel can be 1 to 9, where the symbol buffer size is (1 << (memLevel + 6)) - 1.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • 1
    Seems to me that a trivial optimisation on this is at the start of the next block, take the differences between the old and new code lengths and multiply these by the respective symbol frequencies. If the total is less than the size of the end code plus the size of the new table, and the new table doesn't use any zero-length codes, then defer them and proceed to emit the new block with the old table. – sh1 Feb 22 '17 at 18:39
  • 1
    We should point out, that Deflate compression does not require multiple blocks. Gigabytes of data can be all in a single block. New blocks just give us a chance to make better huffman trees and improve the compression ratio. – Ivan Kuckir May 10 '18 at 20:41