Canonical Huffman Encoder : Contents of Encoded Bitstream

Question

Let's say we have the following canonical Huffman code table.

Symbol    Code-length   Codeword
 A            2          00
 B            2          01
 C            2          10
 D            2          11

Now, we read the symbols from a input file and encode it by just looking at the above table. However, many resources say in case of canonical Huffman we should not send the codewords. Instead, code length for each symbol is enough.

If a text file contains ACCDB, should I transmit 00 01 10 11 or 10 10 10 10 (binary equivalent of corresponding code length) as encoded bit stream? Please rectify me if I am wrong and I appreciate any explanation.

Moreover, if that is the case for canonical Huffman, how would we decode that bit stream to get back original symbols ACCDB (without using Huffman tree at decoder)?

After your edits to the question, that's still not a prefix code. A is a prefix of both C and D. In a prefix code, no code can have any other code as its prefix. — Mark Adler, Jan 20 '17 at 00:28
Now it is an incomplete code. 100 and 111 are not used. You could lop off the last bit of C and D to make those 10 and 11 to make it a complete code, with lengths 2 2 2 2. The only valid four-symbol code lengths are 2 2 2 2 and 1 2 3 3. — Mark Adler, Jan 20 '17 at 14:33

score 0 · Answer 1 · answered Jan 19 '17 at 15:47

0

That is not a canonical Huffman code table, nor is that a Huffman code, nor is that a prefix code. The code lengths 1, 2, 2, 3 oversubscribe the available bits. 1, 2, 2 is a complete code, allowing no more symbols to be coded.

1, 2, 3, 3 is a complete and not-oversubscribed code, in which case an example of the codes would be 0, 10, 110, 111. You can see that those codes can be decoded uniquely, reading them left to right.

answered Jan 19 '17 at 15:47

Mark Adler

101,978
13
118
158

Thank you Mark. Okay, I got about the codes and available bits. However, from one of your earlier explanation in another thread, we are not supposed to send codewords to decoder, right? So, can you please tell me how the encoded bit stream would look like for the above example? In case of regular Huffman, it would be just corresponding codewords. My another question was if we are not supposed to use a Huffman tree at decoder, how decoder will infer code word (or from code lengths) ~ symbol information? Thanks. – beginner Jan 19 '17 at 18:44
That's two more questions, so you should post two new questions. – Mark Adler Jan 19 '17 at 20:14
Okay, sure thing. – beginner Jan 19 '17 at 20:37

Canonical Huffman Encoder : Contents of Encoded Bitstream

1 Answers1