1

I was looking at constructing optimal Huffman codes over non-binary alphabets.

This question was asked in Huffman trees for non-binary alphabets?. The solution suggested to use the Huffman coding procedure combining n lowest frequency symbols at a time (as is also suggested by wikipedia). However, This does not seem to be optimal. Say I have 4 alphabets with frequencies,

 A --> 0.4

 B --> 0.25

 C --> 0.2

 D --> 0.15

The ternary Huffman code derived here using this would be

A --> 0

B --> 10

C --> 11

D --> 12

However the following code would have shorter expected length:

A --> 0

B --> 1

C --> 20

D --> 21

Am I missing something here?

PS I am posting this as a question because I can not comment on the previous post.

Community
  • 1
  • 1
Devil
  • 903
  • 2
  • 13
  • 21
  • Can you please explain the steps on how you get your example solution (A->0, B->10 etc.) as when I perform the Huffman algorithm with n=3 I get the results similar to what you're expecting, not the ones you obtain. – Happington Jul 24 '14 at 17:43
  • I combine the three least probable characters B,C,D to get a supercharacter BCD of probability .6. Next, I have to find a Huffman code for A, and BCD with probabilities .4 and .6. I assign 0 to A and 1 to BCD. Then I unravel BCD to get 10 for B, 11 for C and 12 for D. – Devil Jul 24 '14 at 17:47
  • Ohhh I see where you're going wrong, I'll explain it in an answer, hold up. – Happington Jul 24 '14 at 17:49
  • Sorry for the incorrect information, I have updated my answer with a more correct version and a proper reference. – Happington Jul 24 '14 at 18:17

1 Answers1

2

The wikipedia article pointed to says "Note that for n greater than 2, not all sets of source words can properly form an n-ary tree for Huffman coding. In this case, additional 0-probability place holders must be added." I think for a 3-ary tree the next full tree after 3 leaves has 5 leaves, so I think you should add a 0-probability character before running the 3-ary huffman code algorithm and this gives you {0, C, D} as the first stage, which produces the encoding you prefer.

mcdowella
  • 19,301
  • 2
  • 19
  • 25
  • So the right thing to do would be to add 0-probability alphabets to get the alphabet size equal to the number of leaves a full 3-ary tree can have? – Devil Jul 24 '14 at 18:07
  • While this would make the tree look nicer, does the original specification for a Huffman encoding scheme allow the creation of dummy characters? – Happington Jul 24 '14 at 18:19
  • The wikipedia article makes it easy to work out what total number of characters to aim for - "If the number of source words is congruent to 1 modulo n-1, then the set of source words will form a proper Huffman tree". Since the extra characters are alloted 0 probability and never occur I don't see why they would cause a problem. If you really don't like imaginary characters you could think of them as never to be used worse alternate encodings of one of the existing characters. – mcdowella Jul 24 '14 at 19:53