0

I would like to store data in a compressed format between various applications (some in Python, some in Java, etc.) in such a way that:

  • the producer application can choose from among several formats (e.g. gzip/zstd/zlib/brotli)
  • the consumer application has all the information it needs to uncompress the data

Once the data is uncompressed, all applications know how to deal with the resulting information.

Is there a common/standard container format which includes the compression algorithm type? (e.g. prepending the compressed data with a MIME type in ASCII) Or do the compressed data from most methods already contain a header and magic number that allow the compression type to be determined?

Jason S
  • 184,598
  • 164
  • 608
  • 970
  • https://stackoverflow.com/questions/39008957/is-there-a-way-to-check-if-a-buffer-is-in-brotli-compressed-format seems to indicate that brotli isn't autodetectable :/ – Jason S Nov 04 '20 at 16:53
  • zstd starts with hex `28 b5 2f fd` (https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#zstandard-frames) – Jason S Nov 04 '20 at 17:59

1 Answers1

2

The zip format is quite common, and specifies the compression algorithm. It has method numbers for deflate (8), which is used by gzip and zlib, and zstd (93), but not brotli yet. Also has xz (95).

As for individual wrappers, that's what zlib and gzip are, and zstd has a detectable wrapper. raw brotli, however, is difficult to detect. I am not aware of consistent use of a brotli wrapper. See this lovely answer for why. There was a proposal for a brotli wrapper (also lovely), but I don't think it is in use.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158