using the bitarray
module you get huffman en-/de-coding for free and probably more efficient than anything else:
from bitarray import bitarray
huffman_dict = {
'!': bitarray('01110'), 'B': bitarray('01111'),
'l': bitarray('10100'), 'q': bitarray('10110'),
'y': bitarray('10111')
}
a = bitarray()
a.encode(huffman_dict, 'Bylly!')
print(a)
dec = bitarray('011111011101110').decode(huffman_dict)
print(dec)
print(''.join(dec))
# # output:
# bitarray('011111011110100101001011101110')
# ['B', 'y', '!']
# By!
if you do not want to install the module, read the part below.
here is a variant using the huffman tree to decode - the program works but there may be better variants to represent a binary tree (i chose a tuple).
this version may be better suited when your codewords are not all of the same length. the other nice thing about the binary tree is that here it is obvious that the code is prefix-free.
your code in tree-form looks like this (overly indented in order to make the tree-structure visible):
huffman_tree = \
( # 0
( # 00
None,
# 01
( # 010
None,
# 011
( # 0110
None,
# 0111
( # 01110
'!',
# 01111
'B')))),
# 1
( # 10
( # 100
None,
# 101
( # 1010
( # 10100
'l',
# 10101
None
),
# 1011
( # 10110
'q',
# 10111
'y'))),
# 11
None))
using that you can then decode with:
def huffman_decode(strg):
ret = ''
cur_node = huffman_tree
for char in strg:
cur_node = cur_node[int(char)]
if cur_node is None:
raise ValueError
elif isinstance(cur_node, str):
ret += cur_node
cur_node = huffman_tree
return ret
print(huffman_decode('011111011101110'))
if the decoding hits None
some error occurred and ValueError
is raised. as soon as the decoding hits a string, the current node cur_node
is reset to the 'root node' and the game begins from the beginning of the tree.
and just because i can: here is a visual display of your (incomplete) huffman tree (this may help understand what the algorithm does: whenever you encounter a 0
: go right+down; whenever you encounter a 1
: go right+up); if you hit an end-node: return the character at that node and restart at the root node.
