0

On Wikipedia, the construction of a Huffman tree is described as such:

The simplest construction algorithm uses a priority queue where the node with lowest probability is given highest priority:

  1. Create a leaf node for each symbol and add it to the priority queue.

  2. While there is more than one node in the queue:

    1. Remove the two nodes of highest priority (lowest probability) from the queue

    2. Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.

    3. Add the new node to the queue.

  3. The remaining node is the root node and the tree is complete.

But their example for deconstruction (assigning 0 and 1) is a bit weird:

enter image description here

It's easy to see that except from the leaf node, the frequencies of intermediate nodes don't seem to matter that much? For instance, a3 > a4 and 0 are added to a3 while 1 are added to a4's string. But, a1 < a2 + a3 + a4 and the same thing is done. So, does the frequencies matter?

The same thing happens in videos from Leios Lab and Reducible (two well-known math/programming Youtubers). And also these Stack question:

So here is my questions:

  1. Does it matter if children nodes are place left/right randomly?

  2. If it matter (or doesn't matter), is there a standard? For example, lower node right, higher node left, lower gets a 1, higher get a 0(or whichever combination of these four actions)? It probably doesn't matter with the efficiency if you're consistent already (as all the 3 answers above indicate) but is there a common consensus?

For example, this online generator follows the lower left, higher right, lower 0, higher 1 rule, but others can follow another rule.

P.S: Noted that the second question is two-part, as the right/left problem is related to construction, while the 0/1 relates to when you assign value. Normally you would do both in a same Huffman function anyway but I'm creating an animation on this and the order probably does matter visually.

silverfox
  • 1,568
  • 10
  • 27
  • *"Noted that the second question is two-part"*: there should be only one question. Now it is too broad. – trincot Jul 30 '22 at 15:46
  • @trincot I mean, those two part are related, so I combine them into one. The (more) general question there is "if everything work, consistent or not, what steps do I follow?" – silverfox Jul 30 '22 at 15:50
  • But you write *these* (plural) are my questions. There should be just one question. – trincot Jul 30 '22 at 15:51
  • @trincot So I should delete the P.S to avoid confusion is what you mean here? Or the second question altogether? – silverfox Jul 30 '22 at 15:53
  • There shouldn't be "here are my questions: 1. .... 2. ....". Just one question. – trincot Jul 30 '22 at 15:53
  • @trincot The two questions here are closely related; is that not enough for them to be in the same post? I'm very sorry but I can't quite grasp the situation here. – silverfox Jul 30 '22 at 15:57
  • There is a closure reason that reads *"This question currently includes multiple questions in one. It should focus on one problem only."* – trincot Jul 30 '22 at 16:00
  • @trincot What I mean is, if I delete everything and reword the above question as "Does it matter if children nodes are place left/right randomly, and what is the common practice when placing?", would it still be two question? – silverfox Jul 30 '22 at 16:05
  • I'll just comment: it doesn't matter whether to attach a node as left or right child, and it doesn't matter which you label 0 or 1, and there is no consensus (but that latter will be a matter of opinion, as some will have strong feelings towards a certain choice) – trincot Jul 30 '22 at 16:06
  • @trincot Thank you. So what you meant previously is that the second question is more of an opinion, and it should be removed? Because, as far as I'm concerned, the 2 questions I asked is closely related to each other and to the original problem described. The [following post](https://meta.stackoverflow.com/questions/275908/more-than-one-question-per-post) also discuss that. – silverfox Jul 30 '22 at 16:15
  • For the first question, I have posted an answer where you found a [wrong answer](https://stackoverflow.com/questions/22352202/which-node-go-in-left-or-right-on-addition-of-weight-while-huffman-tree-creation?rq=1) – trincot Jul 30 '22 at 16:24

1 Answers1

2
  1. No. The code will still be optimal. (The accepted answer to the linked question is wrong!)

  2. No. The standard, to the extent that there is one, is to ignore the resulting tree entirely, and use only the resulting code lengths (in bits) for each symbol, and a well-defined lexographic ordering of the symbols, to generate the actual ones and zeros of the codes. This is called a Canonical Huffman code. There are many possible assignments of ones and zeros to the symbol codes, given that the zero and the one can be arbitrarily assigned to left or right for every branch. Using a canonical procedure establishes an agreement between the encoder and decoder as to a single possible code for any given set of lengths and symbols. This not only sets a standard for the zeros and ones, but importantly, since the purpose is compression, reduces the amount of information that needs to be sent from the encoder to the decoder in order to describe the code.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158