I am writing code that produces Huffman codes for sequences of symbols in a given alphabet. It does this through an algorithm that builds a Huffman tree of nodes. Each node either has a unique symbol from this alphabet and is a leaf node, or else has its symbol set to None
and is a parent node. All nodes have a code representing the path from the root to it.
I am now attempting to write a function that decodes an encoded sequence of symbols by doing the following:
- set the initial decoded sequence to
""
- recursively iterate through each level of the tree, beginning at the root
- at each leaf node check if the code of that node is equal to the first
x
characters of the encoded sequence -x
being the length of this code at the current node - if they are equal, append this symbol to the decoded sequence and remove these first
x
characters from the encoded string - begin the recursive searching through the tree from its root for the new first
x
characters of the encoded string - at each parent node continue the recursive search through its children
Here is my code where I have attempted this recursive search:
def decode(root, current, coded_sequence): # Initially called as decode(root, root, coded_sequence)
decoded_sequence = ""
for child in current.children:
if child.symbol and child.code == coded_sequence[:len(child.code)]:
decoded_sequence += child.symbol
coded_sequence = coded_sequence[len(child.code):] # Remove this matching code from the beginning of the coded sequence
decoded_sequence += decode(root, root, coded_sequence)
if child.children:
decoded_sequence += decode(root, child, coded_sequence) # Go back to the root of the tree with the new shortened coded_sequence
return decoded_sequence
My algorithm works, and decoded_sequence
has the correct decoded sequence at the beginning, but it is followed by sections of the end of the decoded sequence, and I can't figure out why. Why is my function continuing on from when I thought coded_sequence
would now be empty?
Here is a sample output:
Here is my best representation of the tree used in this example:
Root
0/ 1|
X6 None
0/ 1|
X5 None
0/ 1|
X4 None
0/ 1|
X3 None
0/ 1|
X2 X1
SOLUTION
I though it would look cleaner if I changed
coded_sequence = coded_sequence[len(child.code):]
decoded_sequence += decode(root, root, coded_sequence)
to
decoded_sequence += decode(root, root, coded_sequence[len(child.code):])
and this completely fixed the problem, which I cannot get my head around...