How do you determine how many bits per character are required for a fixed length code in a string using huffman? i had an idea that you count the number of different characters in a string than you present that number in binary so that will be the fixed length but it doesn't work. For example in the string "letty lotto likes lots of lolly"...there are 10 different characters excluding the quotes(since 10 = 0101(4bits), i thought it meant all the characters can be represented using 4 bits), now the frequency of f is 1 and is encoded as 11111(5 bits)not 4.
-
4[Huffman coding](http://en.wikipedia.org/wiki/Huffman_code) doesn't use fixed-length codewords. Roughly speaking, the length of each codeword is inversely proportional to the frequency with which it occurs. – Oliver Charlesworth Nov 07 '11 at 09:30
1 Answers
Let's say you have a string with 50 "A"s, 35 "B"s and 15 "C"s.
With a fixed-length encoding, you could represent each character in that string using 2 bits. There are 100 total characters, so when using this method, the compressed string would be 200 bits long.
Alternatively, you could use a variable-length encoding scheme. If you allow the characters to have a variable number of bits, you could represent "A" with 1 bit ("0"), "B" with 2 bits ("10") and "C" with 2 bits ("11"). With this method, the compressed string is 150 bits long, because the most common pieces of information in the string take fewer bits to represent.
Huffman coding specifically refers to a method of building a variable-length encoding scheme, using the number of occurrences of each character to do so.
The fixed-length algorithm you're describing is entirely separate from Huffman coding. If your goal is to compress text using a fixed-length code, then your method of figuring out how many bits to represent each character with will work.

- 1,336
- 1
- 13
- 26