In the huffman coding algorithm, there's a lemma that says:
The binary tree corresponding to an optimal binary prefix code is full
But I can't figure out why. How can you prove this lemma?
In the huffman coding algorithm, there's a lemma that says:
The binary tree corresponding to an optimal binary prefix code is full
But I can't figure out why. How can you prove this lemma?
Any binary code for data can be represented as a binary tree. The code is represented by the path from the root to the leaf, with a left edge representing a 0 in the prefix and a right one representing 1 (or vice versa.) Keep in mind that for each symbol there is one leaf node.
To prove that an optimal code will be represented by a full binary tree, let's recall what a full binary tree is, It is a tree where each node is either a leaf or has two chilren.
Let's assume that a certain code is optimal and is represented by a non-full tree. So there is a certain vertex u with only a single child v. The edge between u and v adds the bit x to the prefix code of the symbols (at the leaves) in the subtree rooted at v. From this tree I can remove the edge x and replace u with v, thus decreasing the length of the prefix code of all symbols in the subtree rooted at v by one. This reduces the number of bits in the representation of at least one symbol (when v is a singleton node.)
This shows that the tree didnt actually represent an optimal code, and my premise was wrong. Thus proving the lemma.
From wikipedia,
A full binary tree (sometimes 2-tree or strictly binary tree) is a tree in which every node other than the leaves has two children.
The way in which the tree for huffman code is produced will definitely produce a full binary tree. Because at each step of the algorithm, we remove the two nodes of highest priority (lowest probability) from the queue and create a new internal node with these two nodes as children.