Why permutate the code length codes ( Dynamic Huffman Codes | RFC-1951 )?

Question

I've been studying RFC 1951 (the DEFLATE compression algorithm) and found the interesting "learning" source file by Mark Adler with a cute name -- puff.c. And while brain parsing that file has been an illuminating experience with reasonable following of the RFC-1951 spec, the author does something which is not really clear to me -- he permutates the order by which he indexes into the length array.

The explanation is a bit vague in the comments of the code:

The code length code lengths are received in a permuted order (see the order[] array below) to make a short code length code length list more likely. As it turns out, very short and very long codes are less likely to be seen in a dynamic code description, hence what may appear initially to be a peculiar ordering.

And here's the permutation array described upstairs:

static const short order[19] =      /* permutation of code length codes */
        {16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15};

So, my questions are:

Why is there no mention of this in RFC-1951?
How did they decide upon the exact permutation laid out in order?
How does it work? Why doesn't the decoding process fail since the order has changed?

Much obliged. This isn't homework nor am I trying to outmatch the original algorithm, just for the learning experience. I searched this website and Google was my friend, but to no avail.

score 4 · Accepted Answer · edited Oct 07 '21 at 05:59

4

It is in RFC 1951:

       (HCLEN + 4) x 3 bits: code lengths for the code length
          alphabet given just above, in the order: 16, 17, 18,
          0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15

The decoding process would fail if you tried to change this (without also changing it on the deflate end, in which case it could no longer be called deflate).

The most likely code length codes to be used are early in the list, and the least likely are later in the list. This allows the list to be shorter most of the time, hence HCLEN being smaller, saving three bits per list item not present.

edited Oct 07 '21 at 05:59

Community

1
1

answered May 25 '13 at 14:46

Mark Adler

101,978
13
118
158

Beautiful, must've missed it. Thank you! But, I must ask -- is there somewhere I can read on how they actually determined the permutated order? – Questionnaire May 25 '13 at 18:39
2

The code lengths can be 1..15, with the smallest and largest least likely. So 1 and 15 are at the end. 2 and 14 are before that. And so on. The repeat codes 16, 17, and 18 are pretty much always used, as is 0 indicating that the symbol is not present. So those might as well be at the beginning. – Mark Adler May 26 '13 at 04:05

Why permutate the code length codes ( Dynamic Huffman Codes | RFC-1951 )?

1 Answers1