2

I have 2 application that communicate over a network link. The sender will compress a packet using zlib's deflate() and send it over the network. The receiver will then decompress the data using inflate(). It is possible that packets will be lost over the network link; and in order to minimize decompression errors I have implemented the following approach:
Sender

  1. calls deflate() with Z_SYNC_FLUSH most of the time, but intermittently calls deflate() with Z_FULL_FLUSH.

  2. sends (along with the data) a 2-byte field that contains a bit indicating whether or not a FULL_FLUSH or SYNC_FLUSH was used and a sequence number.

Receiver

  1. Reads in the data; and using the sequence number, detects if a packet has been lost. When there is NO packet lost, the 2-bytes are removed and the decompression works properly.
  2. When a packet lost is detected, the receiver checks whether or not the current packet is a FULL_FLUSH or a SYNC_FLUSH packet.

    • If it's a SYNC_FLUSH, then the packet is simply dropped and we proceed with the next packet.

    • If it's a FULL_FLUSH; however, the receiver removes the extra 2-byte and calls inflate().

This works 99% of the time; in the sense that the inflate() succeeds and the uncompressed data is indeed the same that the sender had before compression. This was my expectation!
Once in a while; however, this approach puts the receiver in a bad state where every subsequent packet (the FULL_FLUSH packet included) fails to decompress. inflate() returns a Z_DATA_ERROR and the zlibContext.zstream.msg contains 'incorrect header check'; although I have occasionally received a 'invalid distance too far back' message.

My first question is

Should I expect to recover and inflate() successfully when the packet at hand was compressed using a FULL_FLUSH flush mode; even if previous packets were lost? For example, sender compresses using deflate(ctx, SYNC_FLUSH) the first 3 packets and sends them; one at a time, over the network. The sender then compresses the fourth packet using deflate(ctx, FULL_FLUSH) and sends it across the network. The receiver receives packet 1 & 2 and calls inflate() with success. The receiver then receives packet 4; it detects (via the sequence #) that it has missed packet 3. Since packet 4 was compressed using a FULL_FLUSH, the receiver expects that the inflate() will successfully decompress the packet. Is this a valid assumption?

My second question is

Is there anything else I need to do in in the receiver to be able recover from packet loss and continue decompressing packets?

Ha.
  • 3,454
  • 21
  • 24
Mark S
  • 21
  • 3
  • Hey Mark, welcome to Stack Overflow. I'd suggest you take a few minutes and re-format your question to clarify what exactly you're asking. There are several formatting tools available to you as well (code-formatting esp) that could be of use here. – brandonscript Aug 26 '16 at 17:45

2 Answers2

1

Your logic is slightly wrong. A FULL_FLUSH packet does a full flush. And after that packet, the state is flushed. By processing the FULL_FLUSH packet, you are attempting to perform the flush -- but you can't, because you don't have the right state to perform the flush.

You can, however, resume after the flush. Because after a flush, the state is flushed.

So after a loss, you don't want to process the FULL_FLUSH packet because you don't have the context necessary to process it. However, after that packet, the state has been fully flushed, so you can resume inflation with the next packet.

So your packet loss logic should be:

  1. Wait until you receive a packet with the FULL_FLUSH bit set.
  2. Wait for the next packet.
  3. If there has been no packets lost between the FULL_FLUSH packet and this packet, then resume inflation (with a clean context!) starting with this packet.
David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • Even that will not work. The first packet contains the stream header. If it is lost, inflate will not decode packets even after FULL_FLUSH. Client and server will never synchronize in this case. – Ha. Aug 26 '16 at 20:36
  • The author of zlib [says it will work](http://stackoverflow.com/a/38552808/721269) and it has worked for random access compressed files. – David Schwartz Aug 26 '16 at 21:01
  • I wrote a program that demonstrated that it doesn't work and looked at zlib sources to understand why - deflate stream contains a 2-byte header. Once you read it, you can jump to any block boundary. – Ha. Aug 26 '16 at 21:21
  • I modified my logic to follow your suggestions and my results are much better. Thank you for your insight & help, David Schwartz! My only remaining issue now is the case where the decompresser misses the first packet. I will now try the suggestions from Mark Adler, that is to run in 'raw' mode. – Mark S Aug 30 '16 at 20:05
1

If you properly break the compressed stream after the full flush, then yes, what follows will always be decompressible with a new or reset instance of inflate.

After the full flush, you must call deflate() until avail_out is not zero, in order to emit the end of the previous stream. What follows that is what you would put in your packet labeled as following a full flush. It is possible that you are not properly locating the start of the compressed data following the full flush, since depending on your buffer size, the flush may happen to be completed most of the time on the first deflate() call.

On the receiver side, make sure that the inflator is starting fresh and decoding in raw mode, which would be done with an inflateInit2() with windowBits equal to -15, or inflateReset() on such a raw inflate state. You must already be doing that if it is working 99% of the time, or ever for that matter.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • I have looked closer at the implementation, and I see that I am NOT using the zlib _raw_ mode. Despite this, I tried calling inflateReset() in the decompresser just before decompressing a 'Fresh Dictionary' packet, and the result is decompression errors. The call to inflateReset() succeeds, but inflate() fails with code -3. I will try now, in _raw_ mode. Thank you @MarkAdler for your help. I will post my results. – Mark S Aug 30 '16 at 20:10
  • I modified the application to use zlib 'raw' mode, and I have confirmed that even when I have lost the first (or first few) packets of a zlib stream, the decompressor can resume on a 'fresh dictionary' packet successfully. Thank you @DavidSchwartz and Mark Adler for all of your help. – Mark S Sep 07 '16 at 19:23