Huffman Tree Issue

Question

We just started on Huffman tree's in class and I'm having a few issues. First given the data and frequencies...

 Data         %    /    -    +     *
 Frequencies  5    10   25   30    50

Create a custom Huffman tree.

I created...

And then replaced the frequencies with the corresponding data, but my roommate received a different answer? Am I the one in the wrong here?

Also, I can't seem to wrap my head around this question,

What would the Huffman code look like if all symbols in the alphabet had
equal frequency?

Any and all help is much appreciated!

P.S. These are just study guide questions, not homework related.

EDIT: How I arrived at my answer:

Took 5 and 10 at the bottom of the tree, added those together to get a "ghost" node 15. Added 25 to the right of that because it is bigger, then created a ghost node 40 by adding those together. put 30 to the left of 40 because it is smaller, and then created a ghost node 70 by adding the two. Finally added 50 to the left of 70 because it was smaller and then created the final ghost node 120 by adding the two.

Did you write a program to calculate the tree? Or, if not, can you describe your algorithm? — Kenney, Nov 20 '15 at 20:11
for the second part: remember that the huffman tree is there to figure out vari-length representations of each character in the raw data. more frequent chars end up with shorter representations. if all chars have the same frequency, then there'd be very little point in doing huffman encoding on the data. you'd be better off just defining a new alphabet with fixed (if shorter) bit sequences for each char. — Marc B, Nov 20 '15 at 20:20
@Kenney i have updated my question with how I came to my answer. I understand that part, and I think it's a rather odd question, but what would the code look like if it was the case of all letters having the same frequency — Bob, Nov 20 '15 at 20:24
A program is nothing else than an algorithm written so a computer can execute it. So, it's a description of what to do when presented with a frequency table, which you gave in your edit - thanks! — Kenney, Nov 20 '15 at 21:08

Kenney · Accepted Answer · 2015-11-21T12:58:03.473

This is what my Huffman.pm comes up with:

0  -- *
10  -- +
110  -- -
1110  -- /
1111  -- %

The above are the shortest symbols encoding the input string. It is a tree, with an implicit root:

   Root
  /    \
0(*)    1
       /  \
     0(+)  1
          /  \
        0(-)  1
             /  \
           0(/) 1(%)

There are different ways to encode it, but it must be consistent. As per wikipedia:

the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol

Your tree

                        120
                       /   \
                    50(*)    70            
                           /  \
                        30(+)  40          
                               / \
                              15  25(-)
                             /  \
                          5(%)  10(/)

satisfies this, although with a different encoding:

The reason our trees are different is because mine is a Canonical Huffman Code, which means that I only have to list the symbols and length of their encoding (or path) for anyone to be able to re-construct the Huffman codes/tree:

*: 1
+: 2
-: 3
/: 4
%: 4

This is because a 0 always terminates a code, and 1 always means that at least 1 more node will follow, except for the outermost leaf (%, 1111), which is possible because we know the maximum code length (the maximum depth of the tree).

so by the looks of it, that would make mine wrong? What would that look like in tree form? — Bob, Nov 20 '15 at 20:52
What? His tree and your tree are identical. (Just flip the 15 and 25 to get the same appearance.) There is nothing wrong with his tree. — Mark Adler, Nov 21 '15 at 06:48
Better. However he did not give an encoding for his tree. There is no law that says you have to assign the 0 to the left branch and the 1 to the right branch. You can put the 0's and 1's on his tree to get the same code you did, or the complement of your code or whatever. — Mark Adler, Nov 21 '15 at 17:01
@Mark yes indeed. I added the canonical encoding part to explain and correct my mistake; I was too strict ;-) Anyway, thanks again! I'll be deleting the answer now, yours is better. I didn't intend for my answer to be the accepted one anyway. (Hm can't delete it..) — Kenney, Nov 21 '15 at 17:09

score 1 · Answer 2 · answered Nov 21 '15 at 06:52

You applied the Huffman algorithm correctly, your tree is fine, and it is the only possible tree with those frequencies. There may be different ways to draw it, but all correct trees for those frequencies will have the exact same topology. What is on the left or on the right does not matter. All that matters is how many branches there are to each symbol.

There are 16 different ways to assign bits to the branches (for each branch 0 could go on the left or the right), so you can get that many different codes. However they are all optimal. If your roommate's code had the same number of bits for each symbol, then you're both right, regardless of how you drew your trees or how you assigned the 0's and 1's.

Huffman Tree Issue

2 Answers2