Inconsistencies when creating Huffman tree

Question

On Wikipedia, the construction of a Huffman tree is described as such:

The simplest construction algorithm uses a priority queue where the node with lowest probability is given highest priority:

Create a leaf node for each symbol and add it to the priority queue.

While there is more than one node in the queue:

Remove the two nodes of highest priority (lowest probability) from the queue

Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.

Add the new node to the queue.

The remaining node is the root node and the tree is complete.

But their example for deconstruction (assigning 0 and 1) is a bit weird:

It's easy to see that except from the leaf node, the frequencies of intermediate nodes don't seem to matter that much? For instance, a3 > a4 and 0 are added to a3 while 1 are added to a4's string. But, a1 < a2 + a3 + a4 and the same thing is done. So, does the frequencies matter?

The same thing happens in videos from Leios Lab and Reducible (two well-known math/programming Youtubers). And also these Stack question:

Which node go in left or right on addition of weight while huffman tree creation (stating that you should be consistent when putting nodes left/right based on frequencies)
What happens if we are inconsistent while creating Huffman Tree? and Huffman Tree Coding (stating that while being inconsistent does change the code after completion, it doesn't matter, as the height stay the same)

So here is my questions:

Does it matter if children nodes are place left/right randomly?
If it matter (or doesn't matter), is there a standard? For example, lower node right, higher node left, lower gets a 1, higher get a 0(or whichever combination of these four actions)? It probably doesn't matter with the efficiency if you're consistent already (as all the 3 answers above indicate) but is there a common consensus?

For example, this online generator follows the lower left, higher right, lower 0, higher 1 rule, but others can follow another rule.

P.S: Noted that the second question is two-part, as the right/left problem is related to construction, while the 0/1 relates to when you assign value. Normally you would do both in a same Huffman function anyway but I'm creating an animation on this and the order probably does matter visually.

*"Noted that the second question is two-part"*: there should be only one question. Now it is too broad. — trincot, Jul 30 '22 at 15:46
@trincot I mean, those two part are related, so I combine them into one. The (more) general question there is "if everything work, consistent or not, what steps do I follow?" — silverfox, Jul 30 '22 at 15:50
But you write *these* (plural) are my questions. There should be just one question. — trincot, Jul 30 '22 at 15:51
@trincot So I should delete the P.S to avoid confusion is what you mean here? Or the second question altogether? — silverfox, Jul 30 '22 at 15:53
There shouldn't be "here are my questions: 1. .... 2. ....". Just one question. — trincot, Jul 30 '22 at 15:53
@trincot The two questions here are closely related; is that not enough for them to be in the same post? I'm very sorry but I can't quite grasp the situation here. — silverfox, Jul 30 '22 at 15:57
There is a closure reason that reads *"This question currently includes multiple questions in one. It should focus on one problem only."* — trincot, Jul 30 '22 at 16:00
@trincot What I mean is, if I delete everything and reword the above question as "Does it matter if children nodes are place left/right randomly, and what is the common practice when placing?", would it still be two question? — silverfox, Jul 30 '22 at 16:05
I'll just comment: it doesn't matter whether to attach a node as left or right child, and it doesn't matter which you label 0 or 1, and there is no consensus (but that latter will be a matter of opinion, as some will have strong feelings towards a certain choice) — trincot, Jul 30 '22 at 16:06
@trincot Thank you. So what you meant previously is that the second question is more of an opinion, and it should be removed? Because, as far as I'm concerned, the 2 questions I asked is closely related to each other and to the original problem described. The [following post](https://meta.stackoverflow.com/questions/275908/more-than-one-question-per-post) also discuss that. — silverfox, Jul 30 '22 at 16:15
For the first question, I have posted an answer where you found a [wrong answer](https://stackoverflow.com/questions/22352202/which-node-go-in-left-or-right-on-addition-of-weight-while-huffman-tree-creation?rq=1) — trincot, Jul 30 '22 at 16:24

Mark Adler · Accepted Answer · 2022-07-30T16:28:52.937

No. The code will still be optimal. (The accepted answer to the linked question is wrong!)
No. The standard, to the extent that there is one, is to ignore the resulting tree entirely, and use only the resulting code lengths (in bits) for each symbol, and a well-defined lexographic ordering of the symbols, to generate the actual ones and zeros of the codes. This is called a Canonical Huffman code. There are many possible assignments of ones and zeros to the symbol codes, given that the zero and the one can be arbitrarily assigned to left or right for every branch. Using a canonical procedure establishes an agreement between the encoder and decoder as to a single possible code for any given set of lengths and symbols. This not only sets a standard for the zeros and ones, but importantly, since the purpose is compression, reduces the amount of information that needs to be sent from the encoder to the decoder in order to describe the code.

Inconsistencies when creating Huffman tree

1 Answers1