2

I am writing code that produces Huffman codes for sequences of symbols in a given alphabet. It does this through an algorithm that builds a Huffman tree of nodes. Each node either has a unique symbol from this alphabet and is a leaf node, or else has its symbol set to None and is a parent node. All nodes have a code representing the path from the root to it.

I am now attempting to write a function that decodes an encoded sequence of symbols by doing the following:

  • set the initial decoded sequence to ""
  • recursively iterate through each level of the tree, beginning at the root
  • at each leaf node check if the code of that node is equal to the first x characters of the encoded sequence - x being the length of this code at the current node
  • if they are equal, append this symbol to the decoded sequence and remove these first x characters from the encoded string
  • begin the recursive searching through the tree from its root for the new first x characters of the encoded string
  • at each parent node continue the recursive search through its children

Here is my code where I have attempted this recursive search:

def decode(root, current, coded_sequence):    # Initially called as decode(root, root, coded_sequence)
    decoded_sequence = ""
    for child in current.children:
        if child.symbol and child.code == coded_sequence[:len(child.code)]:
            decoded_sequence += child.symbol
            coded_sequence = coded_sequence[len(child.code):]    # Remove this matching code from the beginning of the coded sequence
            decoded_sequence += decode(root, root, coded_sequence)
        if child.children:
            decoded_sequence += decode(root, child, coded_sequence) # Go back to the root of the tree with the new shortened coded_sequence
    return decoded_sequence

My algorithm works, and decoded_sequence has the correct decoded sequence at the beginning, but it is followed by sections of the end of the decoded sequence, and I can't figure out why. Why is my function continuing on from when I thought coded_sequence would now be empty?

Here is a sample output:

enter image description here

Here is my best representation of the tree used in this example:

   Root
 0/  1|
X6  None
 0/  1|
X5  None
 0/  1|
X4  None
 0/  1|
X3  None
 0/  1|
X2   X1

SOLUTION

I though it would look cleaner if I changed

coded_sequence = coded_sequence[len(child.code):]
decoded_sequence += decode(root, root, coded_sequence)

to

decoded_sequence += decode(root, root, coded_sequence[len(child.code):])

and this completely fixed the problem, which I cannot get my head around...

KOB
  • 4,084
  • 9
  • 44
  • 88
  • 1
    Please read and follow the posting guidelines in the help documentation. [Minimal, complete, verifiable example](http://stackoverflow.com/help/mcve) applies here. We cannot effectively help you until you post your MCVE code and accurately describe the problem. We should be able to paste your posted code into a text file and reproduce the problem you described. – Prune May 04 '17 at 00:39
  • It looks like it's possible for the algorithm to pass the same `coded_sequence` into the next child search in the same iteration it goes back to the root and starts again. Try it with `elif child.children:` instead of `if`? – c.. May 04 '17 at 00:44
  • @c.. Still seems to be the same or similar with that change, unfortunately – KOB May 04 '17 at 00:50
  • @KOB Your solution changes your algorithm so that you pass `coded_sequence` instead of `coded_sequence[len(child.code):]` into `if child.children: decoded_sequence += decode(root, child, coded_sequence)` – c.. May 04 '17 at 00:57
  • @c.. Very good, I see now what's going on. – KOB May 04 '17 at 01:00
  • Can you supply code that results in the sample output? – innisfree May 04 '17 at 01:50

0 Answers0