3

I need huffman code(best in python or in java), which could encode text not by one character (a = 10, b = 11), but by two (ab = 11, ag = 10). Is it possible and if yes, where could i find it, maybe it's somewhere in the internet and i just can'd find it?

danben
  • 80,905
  • 18
  • 123
  • 145
Adomas
  • 289
  • 2
  • 4
  • 16
  • If this is homework, please tag as such. – danben May 22 '10 at 14:45
  • Not completely homework. I promised my teacher to do this and now I can't. I thought that was much easier :) – Adomas May 22 '10 at 14:47
  • Did you try searching for some huffman coding python code? I found some right away on google with the keyswords 'huffman python'. As IVlad says below, there really isn't much difference between using a single character vs. two characters as your symbols. It should be pretty easy to adapt the code using one character to use two characters. Of course, if the string has an odd number of characters then you will need one symbol to have only one character in it. – Justin Peel May 22 '10 at 15:42

3 Answers3

6

Huffman code doesn't care about characters, it cares about symbols. Generally, it is used to encode the alphabet / other single characters, but can very easily be generalized to encode strings of characters. Basically, you would just take an existing implementation and allow symbols to be strings rather than characters. A leaf node would then correspond to a list of strings.

danben
  • 80,905
  • 18
  • 123
  • 145
1

There's a Huffman encoder example distributed with the Python bitarray module, if that's any use to you.

Scott Griffiths
  • 21,438
  • 8
  • 55
  • 85
0

There is probably some code somewhere. But this sounds like a parsing and tokenising question. One of the first questions I would be answering is how many unique pairs are you dealing with. Huffman encoding works best with small numbers of tokens. For example, the 101 characters on your keyboard. But if your two characters can be anything, you are now expanding the maximum number of characters massively.

drekka
  • 20,957
  • 14
  • 79
  • 135