How is a Huffman compression decoded?

Question

I'm struggling to understand how to decode, say, a text file that has been compressed using Huffman's method. Let's say I'm reading a text file, I get a list of all the characters and the frequency in which they occur, I create a Huffman tree and all the characters have a specific code assigned do them. Say, a: 110 b: 11 c: 010 etc.

When I want to decompress this text file and print/read its contents, how do I do that? How do I know if the file reads "abc" or "bac"?

A small solution I made up was after the Huffman tree has been created, I read the file all over again and create an array to insert every character code as I read it.

Say, a while loop where I read a character until I've reached EOF. character = a; insert 110 into array. Character = b; insert 11 into array until we are left with 11011010. But I feel like there should be a better way.

EDIT: The codes for a,b, and c are random, not actual Huffman codes. I put in random ones as it's irrelevant for the question, I'm only interested in how it would be decoded with or without a real life example. But here's an example of Huffman code for "Hello World."

l: 11 o: 001 H: 100 e: 0101 spacebar: 0000 w: 0001 r: 101 d: 011 .: 0100

"Say, a: 110 b: 11" -- that's not a valid Huffman code, since `b` is a prefix of `a`. — Dan Mašek, Feb 16 '23 at 19:17
@DanMašek Updated my post and noted it to another commenter, the Huffman code is a random made-up one, not an actual one I'm working with. I just need to know how it would decode it. — JosephE, Feb 16 '23 at 19:26
Now your question is making even less sense. You say "insert 110 into array", which would be part of _encoding_, not decoding, and then ask if there is a better way. A better way to do what? Are you asking about how to encode or how to decode? — Mark Adler, Feb 16 '23 at 20:03

Mark Adler · Answer 1 · 2023-02-16T22:20:06.790

A Huffman code is a prefix code, which means that no code can be a prefix of any other code. Your example of a Huffman code is most definitely not a Huffman code. There you have 11 (c), which is a prefix of 110 (b). That cannot be the result of a correct implementation of Huffman's algorithm.

Update for question update:

You are incorrect. The codes are extremely relevant for the question. The examples you gave cannot be unambiguously decoded.

Second update of question:

It is still not clear what you're asking, but here is an answer to the question: "How do I decode a stream of bits that are a Huffman-coded sequence of symbols?"

Here is the tree for the example prefix code:

You see that if you follow any sequence of branches to a symbol, the branches you followed are the bits in that code. That is exactly how you decode the incoming stream of bits.

Start at the node at the top of the tree.
Read one bit from the stream.
Follow the branch for that value, left for 0, right for 1.
If you arrive at another node, go to step 2.
Otherwise, emit the symbol in the leaf, and go to step 1.

I just used random numbers for the prefixes, should've probably clarified that. But yeah, the codes for each character was random, I just wanted to know how it would read them — JosephE, Feb 16 '23 at 19:24

How is a Huffman compression decoded?

1 Answers1