1

In the book I'm using for my class (and from what I've seen from a few other places), it seems like the algorithm for creating a huffman tree stems from

(1) Building a minheap based on the frequency of each character in whatever file or string is being read in.

(2) Popping off the 2 smallest values from the minheap and combining their weights into a new node.

(3) Re-inserting the new node back into the same minheap.

I'm confused about step 3. Most huffman trees I've seen have attributes more similar to a max heap than a minheap (although they are not complete trees). That is to say, the root contains the maximum weight (or combination of weights rather), while all of it's children have lesser weights. How does this implementation give a huffman tree when the combined nodes are put back into a minheap? I've been struggling with this for a while now.

A similar question has already been posted here (with the same book as me): I don't understand this Huffman algorithm implementation

In case you wanted to see the exact function described in (3).

Thanks for any help!

Community
  • 1
  • 1
Wakka Wakka Wakka
  • 271
  • 1
  • 9
  • 16

1 Answers1

1

A Huffman tree is often not a complete binary tree, and so is not a min-heap.

The Huffman algorithm is easily understood as a list of frequencies from which a tree is built. Small branches are constructed first, which will eventually all be merged into a single tree. Each list item starts off as a symbol, and later may be a symbol or a sub-tree that has been built. Each list item always has a frequency (an integer count usually).

Take the two smallest frequencies out of the list (ties don't matter -- any choice will result in an optimal code, though there may be more than one optimal code). Construct a single-level binary tree from those two, where the two leaves are the symbols for those frequencies. Add the frequencies to make a new frequency representing the tree. Put that frequency back in the list. The list now has one less frequency in it.

Repeat. Now the binary tree constructed at each step may have symbol leaves on each branch, or one leaf and a previously constructed tree, or two trees (at earliest in the third step).

Keep going until there is only one frequency left in the list. That will be the sum of all the original frequencies. That frequency has the complete Huffman tree associated with it.

Now you can (arbitrarily) assign a 0 and a 1 to each binary branch. You build codes or decode codes by traversing the tree from the root to a symbol. The bits from the branches of that traverse are in order the Huffman code for that symbol.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • Yet, the list you mention in which the frequencies are stored is probably best implemented using a heap : the two operations used are minimum extraction and frequency insertion. – Vincent Nivoliers Apr 25 '16 at 23:33