6

It's my understanding Brotli stores blocksize information in a meta-block header with only the final uncompressed size of the block, and no information about the compression length (9.2). I'm guessing that a wrapper would be need to be created in order to use it with multiple threads, or possibly something similar to Mark Adler's pigz.

Would the same threading principles apply to Brotli as they do with gzip in this case, or are there any foreseeable issues to be aware of when it comes to multithreading implementations?

Community
  • 1
  • 1
l'L'l
  • 44,951
  • 10
  • 95
  • 146

1 Answers1

9

You can use the brotli format as is for this purpose. I got them to add the option of putting metadata in empty meta-blocks (where "empty" means that the meta-block produces zero uncompressed data). You can put markers in metadata to aid in finding meta-blocks. An inserted empty meta-block also starts the next meta-block at a byte boundary.

Each meta-block can be independent of the other meta-blocks. If the stream is constructed that way, then there is no issue with combining them when compressing or separately decompressing them. The areas of possible dependency are the ring buffer of the four last distances used, and backwards references past the beginning of the current meta-block. For parallel use, a meta-block can and must be constructed so as to not depend on the last four distances, not referring to the ring buffer until it has been filled with distances from the current meta-block. In addition, distances that reach back before the current meta-block would not be allowed (which includes no static references). Lastly you would append an empty or metadata meta-block to bring the sequence to a byte boundary for easy concatenation.

By the way, it looks like you're linking to an older version of the draft format. Here is a link to the current version.

Community
  • 1
  • 1
Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • That helps a bunch! I was a bit worried about the meta-blocks (as the older draft had proposed), so I'm really glad to see that the option you had them put in is now available. Is there a preferred or recommended way to verify/checksum the stream; I was thinking that might not be an issue if everything was taking place locally, although if not there might be some need for it perhaps. Thank you very much! – l'L'l Jul 03 '16 at 19:43
  • 1
    They asked me to propose a wrapper format for brotli with integrity checks and other features, which I did and which you can [find here](https://github.com/madler/brotli/blob/master/br-format-v3.txt). However I don't know that they accepted it or recommend it. – Mark Adler Jul 03 '16 at 19:51