0

I'm using the idea of the gzip code posted in zlib. For initialization I use deflateInit2(p_strm, Z_DEFAULT_COMPRESSION, Z_DEFLATED, (15+16), 8, Z_DEFAULT_STRATEGY). I'm zipping a stream. Each packet with Z_FULL_FLUSH, except from the last which I use Z_FINISH. After zipping each packet, I'm reordering the packets.

data in packets ---> [zip] ---> [reordering] ---> ...

If I inflate the data after the zip, I'm getting the exact file before zipping. If I inflate the data after the reordering of the packets (again: each packet is deflated with Z_FULL_FLUSH, except for the last Z_FINISH) I get a file that is very similar to the original file before zipping. The difference is in the end of the file: it lack of bytes. That's because when I'm inflating it, I get an error for the last packet (Z_DATA_ERROR). If I inflate, let's say, with chunks of 50KB, the inflated file after reordering is the same file as the input, less <50KB (the whole last packet is gone cause of the error). If I decrease the inflating chunk size to 8B, I still get the Z_DATA_ERROR, but now I loose less data while inflating, (In my example I lack one Byte from the original file).

I'm not reordering the last packet (Z_FINISH). I tried to send all of the packets with Z_FULL_FLUSH and then, send another "empty" packet (only Z_FINISH which is 10 bytes).

Why is this happening? If I use Z_FULL_FLUSH, Why can't the inflater inflate it correctly? does it remember the order of the deflated packets?

Any information will help, Thanks.

hudac
  • 2,584
  • 6
  • 34
  • 57
  • 1
    What gave you the impression that zip was resilient to packet reordering? – Raymond Chen Oct 30 '13 at 01:21
  • It just works, I inflate the file and it works (except from the last chunk of inflate). And then I understood that `Z_FULL_FLUSH` is in charge of it... – hudac Oct 30 '13 at 12:15

2 Answers2

3

Since you are using Z_FULL_FLUSH which erases the history at each flush, you can reorder the packets, except for the last one. The one you did Z_FINISH on must be the last packet. It doesn't need to have any data though. You can feed all of your data from your last packet using Z_FULL_FLUSH, and then do one final packet with no input data and Z_FINISH. That will permit you to reorder the packets before that empty one all you like. Just always have that last one at the end.

The reason is that the deflate format is self terminating, so that last piece marks the end of the stream. If you reorder it to the middle somewhere, then the inflation with stop when it hits that packet.

The gzip header and trailer need to be maintained at the beginning and the end, and the CRC in the trailer updated accordingly. The CRC check at the end depends on the order of the data.

Why are trying to do what you're trying to do? What are you optimizing?

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • Thanks, Actually I'm not reordering the last packet (forgot to write that). I also tried to send all the packets with `Z_FULL_FLUSH` and then send the last data (0 input data, 10 output overhead of gzip data) - but it doesn't work also :\ ... – hudac Oct 30 '13 at 08:01
  • I'm doing packet reordering for optimization. – hudac Oct 30 '13 at 08:53
  • So theoretically it seems that I'm not doing anything wrong, isn't it? I'm not reordering the first/last packet. Does the information that I supplied before may help me understand that something in the CRC/last `gzip` headers isn't good? (The input file and the inflated output file are identical except some bytes in the end of the latter. This number of bytes is determined by the size of the chunks I try to inflate). Sorry, It's not exactly optimizing. I'm getting each packet from another component not in the same order as I want to send it, so I reorder it as I want. – hudac Oct 30 '13 at 17:39
  • First get it working with raw deflate, using `-15` for `windowBits`. Then you don't have to worry about the CRC. Then if you are properly extracting all the bytes from `deflate()` after the `Z_FULL_FLUSH` (you need to need to keep calling `deflate()` until it returns `stream_avail != 0`), reordering the packets other than the last packet should then be inflatable, again using `-15` in `inflateInit2()`. – Mark Adler Oct 30 '13 at 18:06
  • Thanks! I'll try that when i'll be able to and tell if it worked. – hudac Oct 30 '13 at 20:56
  • Oops, typo in my last comment -- should be: "until it returns `stream->avail_out != 0`". – Mark Adler Oct 31 '13 at 05:59
  • It works, but it doesn't help much cause I need `windowBits` to have gzip header & trailer.. `Add 16 to windowBits to write a simple gzip header and trailer around the compressed data` – hudac Nov 03 '13 at 10:04
  • Good! Getting that far helps a great deal. Now you can add the gzip header and trailer yourself. A header of `1f 8b 08 00 00 00 00 00 00 ff` will work. The trailer is two four byte values written in little-endian order. The first is the CRC-32 which you can calculate using the `crc32()` function in zlib. You must feed it the uncompressed contents of your packets _in the order that they appear in the gzip stream_. The second four bytes is the total uncompressed length (modulo 2^32 if its more than 4 GB). Then you have your gzip stream. – Mark Adler Nov 03 '13 at 14:35
  • Thanks! Actually I found where to zip the data, after reordering.. I will try your solution later! – hudac Nov 10 '13 at 10:24
1

GZip is a streaming protocol. The compression depends on the prior history of the stream. You can't reorder it.

user207421
  • 305,947
  • 44
  • 307
  • 483