2

I am trying to understand how Huffman coding works and it is supposed to compress data to take less memory than actual text but when I encode for example

"Text to be encoded" 

which has 18 characters the result I get is

"100100110100101110101011111000001110011011110010101100011"

Am I supposed to divide those result bits by 8 since character has 8 bits?

r3mainer
  • 23,981
  • 3
  • 51
  • 88
outbeyond
  • 35
  • 7
  • 1
    Actual result is `10010011 01001011 10101011 11100000 11100110 11110010 10110001 00000001` - **8** ASCII characters (technically, you should not *divide* by 8, but *group* by 8 bit chunks). More accurate is `"Text to be encoded" == 18 * 8 = 144 bits` before and `57` bits after the compression – Dmitry Bychenko Jan 08 '18 at 21:59
  • 1
    "Text to be encoded" is a string. Each character in the uncompressed string is represented by an 8-bit ASCII character making the total uncompressed string 18*8=144 bits. The Huffman code is 57 bits. – jodag Jan 08 '18 at 22:01

1 Answers1

4

You should compare the same units (bits as in the after the compession or characters as in the text before), e.g.

before: "Text to be encoded" == 18 * 8 bits = 144 bits
                             == 18 * 7 bits = 126 bits (in case of 7-bit characters)
after:  100100110100101110101011111000001110011011110010101100011 = 57 bits

so you have 144 (or 126) bits before and 57 bits after the compression. Or

before: "Text to be encoded" == 18 characters
after:   10010011 
         01001011
         10101011
         11100000
         11100110
         11110010
         10110001
         00000001 /* the last chunk is padded */ == 8 characters 

so you have 18 ascii characters before and only 8 one byte characters after the compression. If characters are supposed to be 7-bit (0..127 range Ascii table) we have 9 characters after the compression:

after:  1001001 'I'
        1010010 'R'
        1110101 'u'
        0111110 '>'
        0000111 '\0x07'
        0011011 '\0x1B'
        1100101 'e'
        0110001 'l'
        0000001 '\0x01'
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215