I'm building a Python program to compress/decompress a text file using a Huffman tree. Previously, I would store the frequency table a .json file alongside the compressed file. When I read in the compressed data and .json, I would rebuild the decompression tree from the frequency table. I thought this was a pretty eloquent solution.
However, I was running into an odd issue with files of medium length where they would decompress into strings of seemingly random characters. I found that the issue occurred when two character where occurring the same number of times. When I rebuilt my tree, any of those characters with matching frequencies would have the chance of getting swapped. For the majority of files, particularly large and small files, this wasn't a problem. Most letter occurred slightly more or slightly less than others. But for some medium sized files, a large portion of the characters occurred the same number of times as another character resulting in gibberish.
Is there a unique identifier for my nodes that I can use instead to easily rebuild my tree? Or should I be approaching the tree writing completely differently?