Huffman codes over non binary alphabets

Question

I was looking at constructing optimal Huffman codes over non-binary alphabets.

This question was asked in Huffman trees for non-binary alphabets?. The solution suggested to use the Huffman coding procedure combining n lowest frequency symbols at a time (as is also suggested by wikipedia). However, This does not seem to be optimal. Say I have 4 alphabets with frequencies,

 A --> 0.4

 B --> 0.25

 C --> 0.2

 D --> 0.15

The ternary Huffman code derived here using this would be

A --> 0

B --> 10

C --> 11

D --> 12

However the following code would have shorter expected length:

A --> 0

B --> 1

C --> 20

D --> 21

Am I missing something here?

PS I am posting this as a question because I can not comment on the previous post.

Can you please explain the steps on how you get your example solution (A->0, B->10 etc.) as when I perform the Huffman algorithm with n=3 I get the results similar to what you're expecting, not the ones you obtain. — Happington, Jul 24 '14 at 17:43
I combine the three least probable characters B,C,D to get a supercharacter BCD of probability .6. Next, I have to find a Huffman code for A, and BCD with probabilities .4 and .6. I assign 0 to A and 1 to BCD. Then I unravel BCD to get 10 for B, 11 for C and 12 for D. — Devil, Jul 24 '14 at 17:47
Ohhh I see where you're going wrong, I'll explain it in an answer, hold up. — Happington, Jul 24 '14 at 17:49
Sorry for the incorrect information, I have updated my answer with a more correct version and a proper reference. — Happington, Jul 24 '14 at 18:17

score 2 · Accepted Answer · answered Jul 24 '14 at 18:01

2

The wikipedia article pointed to says "Note that for n greater than 2, not all sets of source words can properly form an n-ary tree for Huffman coding. In this case, additional 0-probability place holders must be added." I think for a 3-ary tree the next full tree after 3 leaves has 5 leaves, so I think you should add a 0-probability character before running the 3-ary huffman code algorithm and this gives you {0, C, D} as the first stage, which produces the encoding you prefer.

answered Jul 24 '14 at 18:01

mcdowella

19,301
2
19
25

So the right thing to do would be to add 0-probability alphabets to get the alphabet size equal to the number of leaves a full 3-ary tree can have? – Devil Jul 24 '14 at 18:07
While this would make the tree look nicer, does the original specification for a Huffman encoding scheme allow the creation of dummy characters? – Happington Jul 24 '14 at 18:19
The wikipedia article makes it easy to work out what total number of characters to aim for - "If the number of source words is congruent to 1 modulo n-1, then the set of source words will form a proper Huffman tree". Since the extra characters are alloted 0 probability and never occur I don't see why they would cause a problem. If you really don't like imaginary characters you could think of them as never to be used worse alternate encodings of one of the existing characters. – mcdowella Jul 24 '14 at 19:53

Huffman codes over non binary alphabets

1 Answers1