-2

A data file contains a sequence of 8-bit characters such that all 256 characters are about as common: the maximum character frequency is less than twice the minimum character frequency. Prove that Huffman coding in this case is not more efficient than using an ordinary 8-bit fixed-length code.

madth3
  • 7,275
  • 12
  • 50
  • 74
daniel pns
  • 19
  • 1
  • 3
  • 1
    Where have you got to on this? What are your thoughts so far? What approaches to the problem have you considered? – Trevor Tippins Jan 22 '12 at 11:30
  • Actually, i didnt understand the question that much. Should i consider frequencies of 256 characters or only 8? – daniel pns Jan 22 '12 at 11:32
  • 1
    They're just saying that 8-bit bytes are representing a domain of a total of 256 characters (which is a bit anachronistic in today's world). In essence, because the frequency of the values of those bytes have a more or less equal distribution the bit sequences used in the Huffman tree to represent them or going to be just about as long as the byte values themselves. On top of this you'd also have to store the tree so the file could be decoded. Read up some more on Huffman Encoding! – Trevor Tippins Jan 22 '12 at 11:47

1 Answers1

3

The proof is direct. Assume w.l.o.g. that the characters are sorted in ascending order of frequency. We know that f(1) and f(2) will be joined first into f'(1), and since f(2) >= f(1) and 2*f(1) > f(256), this won't be joined until after f(256) is joined with something. By the same token, f(3) and f(4) will be joined into f'(2) with f'(2) >= f'(1) > f(256). Continuing thusly, we get f(253) and f(254) joined into f'(127) >= ... >= f'(1) > f(256). Finally, f(255) and f(256) are joined into f'(128) >= f'(127) >= ... >= f'(1). We now recognize that since f(256) < 2*f(1) <= f'(1) and f'(128) <= 2*f(256), f'(128) <= 2*f(256) < 4*f(1) <= 2*f'(1). Ergo, f'(128) < 2*f'(1), the same condition that held for the first round of the Huffman algorithm.

Since the condition holds on this round, it is straightforward to argue that it will similarly hold on all rounds. Huffman will perform 8 rounds until all nodes are joined to one, the root (128, 64, 32, 16, 8, 4, 2, 1), at which point the algorithm will terminate. Since at each stage each node is joined to another one which has, to that point, received the same treatment by the Huffman algorithm, each branch of the tree will have the same length: 8.

This is somewhat informal, more of a sketch than a proof, really, but it should be more than enough for you to write something more formal.

Patrick87
  • 27,682
  • 3
  • 38
  • 73