0

Given a string, how can I compress it in memory with gzip? I'm using Lua.


It sounds like an easy problem, but there is a huge list of libraries. So far, all that I tried were either dead or I could produce only zlib compressed strings. In my use case, I need gzip compression, as it is expected by the receiver.

As a test, if you dump the compressed string to a file, zcat should be able to decompress it.

I'm using OpenResty, so any Lua library should be fine.

(The only solution that I got working so far is to dump the string in a file, call os.execute("gzip /tmp/example.txt") and read it back. Unfortunately, that is not a practical solution.)

Philipp Claßen
  • 41,306
  • 31
  • 146
  • 239

1 Answers1

4

It turns out that zlib is not far away from gzip. The difference is that gzip has an additional header.

To get this header, you can use lua-zlib like this:

local zlib = require "zlib"

-- input:  string
-- output: string compressed with gzip
function compress(str)
   local level = 5
   local windowSize = 15+16
   return zlib.deflate(level, windowSize)(str, "finish")
end

Explanation:

  • The second parameter of deflate is the window size. It makes sure that a gzip header is written. If you omit the parameter, you get a zlib compressed string.
  • level is the gzip compression level (1=worst to 9=best)

Here is the documentation of the deflate (source: lua-zlib documentation):

function stream = zlib.deflate([ int compression_level ], [ int window_size ])

If no compression_level is provided uses Z_DEFAULT_COMPRESSION (6),
compression level is a number from 1-9 where zlib.BEST_SPEED is 1
and zlib.BEST_COMPRESSION is 9.

Returns a "stream" function that compresses (or deflates) all
strings passed in.  Specifically, use it as such:

string deflated, bool eof, int bytes_in, int bytes_out =
        stream(string input [, 'sync' | 'full' | 'finish'])

    Takes input and deflates and returns a portion of it,
    optionally forcing a flush.

    A 'sync' flush will force all pending output to be flushed to
    the return value and the output is aligned on a byte boundary,
    so that the decompressor can get all input data available so
    far.  Flushing may degrade compression for some compression
    algorithms and so it should be used only when necessary.

    A 'full' flush will flush all output as with 'sync', and the
    compression state is reset so that decompression can restart
    from this point if previous compressed data has been damaged
    or if random access is desired. Using Z_FULL_FLUSH too often
    can seriously degrade the compression. 

    A 'finish' flush will force all pending output to be processed
    and results in the stream become unusable.  Any future
    attempts to print anything other than the empty string will
    result in an error that begins with IllegalState.

    The eof result is true if 'finish' was specified, otherwise
    it is false.

    The bytes_in is how many bytes of input have been passed to
    stream, and bytes_out is the number of bytes returned in
    deflated string chunks.
Philipp Claßen
  • 41,306
  • 31
  • 146
  • 239