Convention for Huffman Coding

Question

Is there a convention for generating a Huffman encoding for a certain alphabet? It seems like the resultant encoding depends both on whether you assign '0' to the left child or the right child as well as how you determine which symbol will go to the left tree.

Wikipedia says that:

As a common convention, bit '0' represents following the left child and bit '1' represents following the right child.

So that is an answer to the first half of the variance. However, I couldn't find any convention for the second half. I would assume something like making the node with lower probability go on the left, but several example Huffman trees online don't do this.

For example:

huffman tree

So is there a convention for the assignment of nodes to left and right, or is it up to the implementation?

I apologize if this is a duplicate, but I wasn't able to find an answer.

I think the only "convention" is in the algorithms we've selected as "standard", i.e. gzip. — Jonathon Reinhart, Nov 16 '15 at 03:22
In that case, does it matter? Will there ever be a case where choosing one would yield a less efficient code than would result from choosing the other? (maybe this should be a new question) — andars, Nov 16 '15 at 03:28

Mark Adler · Accepted Answer · 2015-11-16T20:16:01.570

Yes, in fact there is. Not so much a convention for interoperability, but rather for encoding efficiency. It's called Canonical Huffman, where the codes are assigned in numerical order from the shortest codes to the longest codes, and within a single code length, they are assigned in a lexicographical order on the symbols. This permits transmitting only the length of the code for each symbol, as opposed to the entire tree structure.

Generally what is done is to use the Huffman algorithm tree only to determine the number of bits for each symbol. The tree is then discarded. Bit values are never assigned to the branches. The codes are then built directly from the lengths, using the ordering above.

score -1 · Answer 2 · answered Aug 15 '21 at 02:28

Check it out

        class Nodo{
      constructor(v=null,f=null,l=null,r=null){
            this.f=f
            this.v=v
            this.l=l
            this.r=r
          }
      }
        function EnCrypt(text){
            let lista=[]
            for(let i=0;i<text.length;i++){//Create the list with the appearances 
                !lista.find(e => e.v===text[i]) && lista.push(new Nodo(text[i],(text.match(new RegExp(text[i],"g")) || []).length))
            }
            lista=lista.sort((a,b)=>a.f-b.f)//Order from smallest to largest
            //----------------------------------------------------
            function createNew(){//Create the tree
                let nodos=lista.splice(0,2)
                lista.push(new Nodo(null,nodos[0].f+nodos[1].f,nodos[0],nodos[1]))
                if(lista.length==1)return lista
                createNew()
            }createNew()
          ///-----------------------------------------------
            let Arbol=lista[0]
            function Codigo(nodo,c=""){//recursively traverse the tree 
                if(!nodo.l && !nodo.r)return [nodo.v,c]
                return Codigo(nodo.l,c+"0")+";"+Codigo(nodo.r,c+"1")
            }
           //-----------------------------------------------
            const codigo=(Codigo(Arbol)).split(";")
            let finish=""
            text.split("").map(t=>{
                codigo.map(e => {
                    if(e.split(",")[0]==t)finish+=e.split(",")[1]
                })
            })
            return {
                cod:finish,
                dicc:codigo
            }
        }
        function DeCrypt(key,res=""){
            let {cod,dicc}=key
            let temp=""
            for(let i=0;i<=cod.length;i++){
                temp+=cod.substr(i,1)
               dicc.map((d)=>{
                    d=d.split(",")
                    if(temp==d[1]){
                        res+=d[0]
                        temp=""
                        cod=cod.substr(i)
                        i=0
                    }
                })
            }
            return res
        }
        function Huffman(){
                const text=document.querySelector("#newValue").value
                const comp= EnCrypt(text)
                document.querySelector(".res").innerHTML=JSON.stringify(comp,null, 4)
            }
        function HuffmanDecode(){
                const text=JSON.parse(document.querySelector("#huffman").value)
                const comp= DeCrypt(text)
                document.querySelector(".res2").innerHTML=comp
            }

<h1></h1>
<input type="text"
       placeholder="set value (min 2 chars)" id="newValue">
<button onclick="Huffman()">Go</button>
<div class="res" style="white-space: pre-wrap;"></div>

<input type="text" placeholder="paste the result from above" id="huffman"><button onclick="HuffmanDecode()">decode</button>
 <div class="res2" style="white-space: pre-wrap;"></div>

Thanks for sharing the code here. Perhaps I’m missing something, but how does this address the OP’s question about the different meanings of the 0 and 1 child? — templatetypedef, Aug 15 '21 at 02:33
The nodes are ordered according to their weight from lowest to highest in a list, therefore, the lowest weight would always be the left one and in the tree it would be left child (0) right child (1), sorry if I'm wrong — David190, Aug 15 '21 at 02:46
I had interpreted the question to mean “does it matter how you assign trees to the left or right subtrees?” rather than “how do you assign them?” But perhaps that’s me misreading things. — templatetypedef, Aug 15 '21 at 12:49
instead of just the code snippet you should explain how it answers the OP's question — v2v1, Aug 17 '21 at 20:53

Convention for Huffman Coding

2 Answers2