3

In the huffman coding algorithm, there's a lemma that says:

The binary tree corresponding to an optimal binary prefix code is full

But I can't figure out why. How can you prove this lemma?

Kadaj13
  • 1,423
  • 3
  • 17
  • 41
  • I would guess the best approach would be to prove that any non-full tree leads to a non-optimal code, but it's been a long time since I did that sort of thing... This isn't really a programming question though, so it's a bit off-topic... – twalberg May 16 '14 at 16:47
  • @twalberg I think there is a more easier approach to prove this. Please have a look at my answer and comment if it is incorrect. – Nikunj Banka May 16 '14 at 16:55

2 Answers2

2

Any binary code for data can be represented as a binary tree. The code is represented by the path from the root to the leaf, with a left edge representing a 0 in the prefix and a right one representing 1 (or vice versa.) Keep in mind that for each symbol there is one leaf node.

To prove that an optimal code will be represented by a full binary tree, let's recall what a full binary tree is, It is a tree where each node is either a leaf or has two chilren.

Let's assume that a certain code is optimal and is represented by a non-full tree. So there is a certain vertex u with only a single child v. The edge between u and v adds the bit x to the prefix code of the symbols (at the leaves) in the subtree rooted at v. From this tree I can remove the edge x and replace u with v, thus decreasing the length of the prefix code of all symbols in the subtree rooted at v by one. This reduces the number of bits in the representation of at least one symbol (when v is a singleton node.)

This shows that the tree didnt actually represent an optimal code, and my premise was wrong. Thus proving the lemma.

shebang
  • 413
  • 2
  • 8
1

From wikipedia,

A full binary tree (sometimes 2-tree or strictly binary tree) is a tree in which every node other than the leaves has two children.

The way in which the tree for huffman code is produced will definitely produce a full binary tree. Because at each step of the algorithm, we remove the two nodes of highest priority (lowest probability) from the queue and create a new internal node with these two nodes as children.

Nikunj Banka
  • 11,117
  • 16
  • 74
  • 112
  • @twalberg No matter how unbalanced the tree is, it does not matter for defining whether a tree is full or not. Because even in this case all the non leaves have two children. – Nikunj Banka May 16 '14 at 17:32