4

i am trying to write the huffman coding in Python 3 with the code from http://en.literateprograms.org/Huffman_coding_%28Python%29 but it doesn't work. If i run the code in Python 2.7, it works well.

The following lines are the problem:

heapq.heapify(trees)
while len(trees) > 1:
    childR, childL = heapq.heappop(trees), heapq.heappop(trees)
    parent = (childL[0] + childR[0], childL, childR)
    heapq.heappush(trees, parent)

I get a TypeError in heapq.heappush(u,parent): "unorderable types: tuple() < str()"

So i've searched for a solution, and i think, i have to implement a _lt _ function. Its possible that two or more nodes have the same frequency, then heapq tries to compare the tuples and i think, he can't compare a tuple of a tuple. But i dont know where and how i have to create a compare-method to solve this problem? Can anybody help? ;-)

Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504

2 Answers2

3

I just had exactly the same problem, porting some old code from Python 2 to Python 3, among other a Huffman coding algorithm using heapq. The problem is indeed that sometimes two entries in the heap have the same probability, but a differently structured "tuple tree", once some symbols have been merged.

Example: If the heap contains (0.2, ("A", "B)) and (0.2, (("C", "D"), "E"))) then Python tries to compare strings and tuples. Python 2 will just "sort" the mismatched types by their type name, which does not make much sense, but does not hinder the algorithm, either. Python 3, on the other hand, is more strict and raises an exception.

My workaround was to add another element into the tuple in between the accumulated probability and the "tuple-tree", to avoid the actual values from being compared. You could use, e.g., the hash or the repr of the tuples, or some random number or ever-increasing counter.

I am aware that this is an immensely ugly hack, but unless you want to define your own tree node class with some custom __cmp__ function, this seems to be the only way. If anyone has a better idea, feel free to comment.

tobias_k
  • 81,265
  • 12
  • 120
  • 179
0

The reason is in python3x you can't compare items of two different type:

>>> "foo" < 1
Traceback (most recent call last):
  File "<ipython-input-5-de2fb49cc8c4>", line 1, in <module>
    "foo" < 1
TypeError: unorderable types: str() < int()
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504