1

I'm having a difficult time understanding how the inflate algorithm works even after reading the RFC and reviewing c and javascript implementations. I compressed a file with the text "TestingTesting" and got the following result in hex: 0B 49 2D 2E C9 CC 4B 0F 81 50 00

I tried reading the data after 16 and 32-bit endian swaps but after reading the first 3 bits I can't get any further because the next 5 bits don't make sense. What am I doing wrong and how can this be parsed?

References I've used: RFC 1951 Javascript C

Ashkan Aryan
  • 3,504
  • 4
  • 30
  • 44
ItsJustMe
  • 137
  • 1
  • 1
  • 6
  • 4
    Please post your code so we can actually help you. We can't read minds (yet), so it will be hard to tell where the issue is without an example. – sbtkd85 Sep 12 '11 at 17:27
  • I haven't written any code yet because I'm still trying to understand how to parse the bits correctly. After I understand the algorithm, the coding and verification will be much easier. I was hoping someone could help me break down and parse the bits in the example for the first few codes. – ItsJustMe Sep 12 '11 at 18:15
  • How are your "reading the data after 16 and 32-bit endian swaps" and getting "after reading the first 3 bits I can't get any further because the next 5 bits don't make sense"? – sbtkd85 Sep 12 '11 at 18:18
  • Parsing it by hand. Eg: 0B 49 = 0000 1011 0100 1001, swap the bytes for endian conversion, then read the first bit which is a 0 (not last block), next 2 bits = 10 (static table), then next 5 bits = 01001 which is supposed to be the number of literal codes 257 but that doesn't make sense for the sample text file I used: "TestingTesting" – ItsJustMe Sep 12 '11 at 18:27
  • Ok so how about posting a few links in your original question to the inflate algorithm and implementations you've reviewed. I'm not familiar enough with it to do it by hand off the top of my head. – sbtkd85 Sep 12 '11 at 18:33
  • [RFC 1951](http://www.ietf.org/rfc/rfc1951.txt) [Javascript](https://github.com/dankogai/js-deflate/blob/master/rawinflate.js) [C](http://www.opensource.apple.com/source/gnuzip/gnuzip-12/gzip/inflate.c) – ItsJustMe Sep 12 '11 at 18:36

1 Answers1

13

The output from the compressor is a stream of bytes. Why are you doing an endian swap?

Looking at the first few bytes, as binary:

0B  = 00001011
49  = 01001001
2D  = 00101101  
2E  = 00101110
...

From section 3.1.1 in the RFC:

  • bits are read from right to left, so the first bit of the header, BFINAL, is 1:

     00001011
            ^
    
  • numbers are packed LSB first, and we read from right-to-left, so BTYPE is 01:

     00001011
          ^^
    
  • That's the fixed Huffman code block type, so we expect a Huffman code next. Huffman codes are packed MSB first, so the first code is 10000100 (we continue on to the next byte here):

     00001011
     ^^^^^
     01001001
          ^^^
    
  • Looking at the table in section 3.2.6, 00110000 to 10111111 represent literal bytes 0 - 143, so 10000100 (= 0x84) is the literal value 0x54, which is the ASCII code for "T".

  • Continuing, the next code is 10010101 (= 0x95) which is literal value 0x65 which is "e".

...and so on.

Matthew Slattery
  • 45,290
  • 8
  • 103
  • 119
  • +1 for a concise and legible answer - it took me half-a-dozen reads of RFC1951 to properly grok what you've summarised in four bulletpoints here. – Eight-Bit Guru Dec 16 '13 at 12:44