2

So I made a program that should do Huffman encoding. When comparing my answers to the correct ones, not all of mine matched up.

I got

[("a","0"),("c","100"),("b","101"),("d","110"),("f","1110"),("e","1111")]

The correct answer is

[('a',"0"),('b',"101"),('c',"100"),('d',"111"),('e',"1101"),('f',"1100")]

The correct tree is Correct Tree

Mine method however gives me me a slight change. On branch 30 I use a 0 to get to D instead of a 1.

So this makes me wonder, are both answers correct? After all they both have the same string lengths.

If I am wrong, can someone explain why?

In case anyone wants it, my code is written in Haskell below

mergHufffman::(String,Int) -> (String,Int) -> (String,Int)
mergHufffman x y =  (fst x ++ fst y, snd x + snd y)

data HTree a = Leaf a | Branch (HTree a) (HTree a) deriving Show

treeHuff::[(String,Int)] -> HTree (String,Int)
treeHuff (x:[]) = Leaf x
treeHuff (x:y:[])
        | snd x < snd y = Branch (Leaf x) (Leaf y)
        | snd x > snd y = Branch (Leaf y) (Leaf x)
treeHuff (x:y:z:list)
        | snd x > snd merged = Branch (Leaf x) (treeHuff $ sortFirst $ y:z:list)
        | otherwise = Branch (treeHuff $ y:z:[]) (treeHuff $ sortFirst $ x:list)
        where merged = mergHufffman y z 

sortFirst::[(String,Int)]->[(String,Int)]
sortFirst freq = reverse $ sortBy (comparing snd) freq

readHuffTree :: HTree (String,Int)-> String -> [(String, String)]
readHuffTree (Branch x y) code = f1 ++ f2
                          where 
                          f1 = readHuffTree x (code ++ "0")
                          f2 = readHuffTree y (code ++ "1")
readHuffTree (Leaf x) code = ((fst x, code):[])
karakfa
  • 66,216
  • 7
  • 41
  • 56
error_null_pointer
  • 457
  • 1
  • 6
  • 21
  • Usually the *first* things stated to introduce huffman codings in lectures are 1) optimal codings aren't unique 2) There exist an optimal coding were the two least probable symbols are siblings and their height is maximal. From these observations follows the classical algorithm. – Bakuriu Feb 23 '16 at 10:11

1 Answers1

6

Yes, you can assign 0 and 1 to left and right branches however you like, and you'll still have an optimal Huffman code. Both answers are correct, unless the assignment specified how to assign bit values to the branches. In fact, there are 32 correct answers since there are five branches with two choices each.

As it happens for those frequencies, that tree is the only possible tree. However there are some sets of frequencies that can give trees with different topologies.

A simple example is the set of frequencies 1, 1, 2, 2. When you apply the Huffman algorithm, you will find that you have an arbitrary choice to make when picking the two lowest frequencies in the second step. Depending on which choice you make, you will end up with a tree with all codes the same length (2), or you will end up with codes of lengths 1, 2, 3, and 3. Both codes are the correct answer. Oh, and for the 1, 2, 3, 3 tree, you can pick which symbol with frequency 2 is at the top. So really three distinct trees are possible. And for each tree you have eight ways to assign the 0's and 1's. So there are 24 correct answers in that case. If you multiply the code lengths times the frequencies and add them up, you will see that both tree topologies give 12, so both are optimal.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158