0

Does any mainstream compression algorithm, for example either snappy, zlib or bzip natively support streaming data across the network? For example if I have to send a compressed payload, then will I have to manually prepend the size of the payload before sending the message? Or does any library provide the API to tell whether a message is complete given x bytes?

Curious
  • 20,870
  • 8
  • 61
  • 146

3 Answers3

3

zlib, bzip2, lz4, zstd, brotli, lzma2, and many others all support streaming through the use of an end-of-data marker in the compressed data.

As it happens, one of the ones you mentioned, snappy, is not streamable in the sense you ask, since the format starts with an uncompressed size.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
1

Zstd does. There is a ZSTD_compressStream()/ZSTD_decompressStream() API.

See https://github.com/facebook/zstd/tree/dev/examples.

Pseudo-code below:

// Create stream       
ZSTD_CStream* const cstream = ZSTD_createCStream();

// Init stream
size_t const initResult = ZSTD_initCStream(cstream, cLevel);
size_t read, toRead;

while((read = fread(buffer, 1, toRead, file)) ) {
    ZSTD_inBuffer input = { buffIn, read, 0 };

    // Process next chunk
    while (input.pos < input.size) {
        ZSTD_outBuffer output = { buffOut, buffOutSize, 0 };

        // Compress Data
        toRead = ZSTD_compressStream(cstream, &output , &input);  
        [...]
        fwrite_orDie(buffOut, output.pos, fout);
   }
}

ZSTD_outBuffer output = { buffOut, buffOutSize, 0 };

// End stream
ZSTD_endStream(cstream, &output);  
[...]
// Free stream
ZSTD_freeCStream(cstream);
flanglet
  • 564
  • 4
  • 11
  • Thanks! Would you mind updating your answer with the gist of the API that allows you to stream? The example is sort of hard to parse and I feel like I would miss some key points if I made the inferences myself – Curious Jun 10 '17 at 22:36
  • Not sure this is what you are looking for but I edited the answer with some pseudo code that contains the different steps to compress a buffer with the streaming API (from the example). – flanglet Jun 10 '17 at 22:58
  • I'm sorry I was not clear before. I was looking for a way to decompress the data after not being aware of what it contains or how much data is incoming. – Curious Jun 10 '17 at 23:00
  • @flanglet, does Zstd document stream state size, as it did the ... SLZ http://www.libslz.org/ for DEFLATE-streams ("While zlib uses 256 kB of memory per stream in addition to a few tens of bytes for the stream descriptor itself, SLZ only stores a stream descriptor made of 28 bytes.")? – osgx Jun 10 '17 at 23:02
  • Look at the streaming_decompression.c example. The API ZSTD_decompressStream(dstream, &output , &input) return the size of the data to read (Line 85 of the example). The first value is returned by ZSTD_initDStream(). Put this call in a loop (lines 81-89). – flanglet Jun 10 '17 at 23:06
  • 1
    @osgx The best answer I can find is here: http://fastcompression.blogspot.com/2016/04/working-with-streaming.html – flanglet Jun 10 '17 at 23:12
1

There is also DEFLATE (zlib-compatible) stateless SLZ for streaming (compressing-only) to the many clients with reduced state memory per stream: http://www.libslz.org/ "Stateless ZIP library - SLZ":

SLZ is a fast and memory-less stream compressor which produces an output that can be decompressed with zlib or gzip. It does not implement decompression at all, zlib is perfectly fine for this. The purpose is to use SLZ in situations where a zlib-compatible stream is needed and zlib's resource usage would be too high while the compression ratio is not critical. The typical use case is in HTTP servers and gateways which have to compress many streams in parallel with little CPU resources to assign to this task, and without having to thottle the compression ratio due to the memory usage. In such an environment, the server's memory usage can easily be divided by 10 and the CPU usage by 3. In addition its high performance made it fill a gap in network backup applications.

While zlib uses 256 kB of memory per stream in addition to a few tens of bytes for the stream descriptor itself, SLZ only stores a stream descriptor made of 28 bytes. Thus it is particularly suited to environments having to deal with tens to hundreds of thousands of concurrent streams.

The key difference between zlib and SLZ is that SLZ is stateless in that it doesn't consider the data previously compressed as part of its dictionary. It doesn't hurt compression performance when it is fed in large enough chunks (at least a few kB at once)

osgx
  • 90,338
  • 53
  • 357
  • 513
  • 1
    zlib (deflate) supports streaming as is. The "stateless" business is just to reduce memory usage in the case of _many_ compression threads, though at the cost of compression effectiveness. – Mark Adler Jun 11 '17 at 06:53