binary prefix code in huffman algorithm

Question

In the huffman coding algorithm, there's a lemma that says:

The binary tree corresponding to an optimal binary prefix code is full

But I can't figure out why. How can you prove this lemma?

I would guess the best approach would be to prove that any non-full tree leads to a non-optimal code, but it's been a long time since I did that sort of thing... This isn't really a programming question though, so it's a bit off-topic... — twalberg, May 16 '14 at 16:47
@twalberg I think there is a more easier approach to prove this. Please have a look at my answer and comment if it is incorrect. — Nikunj Banka, May 16 '14 at 16:55

score 2 · Answer 1 · answered May 16 '14 at 19:49

Any binary code for data can be represented as a binary tree. The code is represented by the path from the root to the leaf, with a left edge representing a 0 in the prefix and a right one representing 1 (or vice versa.) Keep in mind that for each symbol there is one leaf node.

To prove that an optimal code will be represented by a full binary tree, let's recall what a full binary tree is, It is a tree where each node is either a leaf or has two chilren.

Let's assume that a certain code is optimal and is represented by a non-full tree. So there is a certain vertex u with only a single child v. The edge between u and v adds the bit x to the prefix code of the symbols (at the leaves) in the subtree rooted at v. From this tree I can remove the edge x and replace u with v, thus decreasing the length of the prefix code of all symbols in the subtree rooted at v by one. This reduces the number of bits in the representation of at least one symbol (when v is a singleton node.)

This shows that the tree didnt actually represent an optimal code, and my premise was wrong. Thus proving the lemma.

score 1 · Accepted Answer · answered May 16 '14 at 16:54

1

From wikipedia,

A full binary tree (sometimes 2-tree or strictly binary tree) is a tree in which every node other than the leaves has two children.

The way in which the tree for huffman code is produced will definitely produce a full binary tree. Because at each step of the algorithm, we remove the two nodes of highest priority (lowest probability) from the queue and create a new internal node with these two nodes as children.

answered May 16 '14 at 16:54

Nikunj Banka

11,117
16
74
112

@twalberg No matter how unbalanced the tree is, it does not matter for defining whether a tree is full or not. Because even in this case all the non leaves have two children. – Nikunj Banka May 16 '14 at 17:32

binary prefix code in huffman algorithm

2 Answers2