I am looking at making a file format that interleaves two types of chunks of raw bytes.
One chunk will contain a block of bzip2-compressed data, which has a header containing the usual bzip2 magic number (BZh9
).
The second chunk will consist of the other data of interest, which has a header containing a different magic number (TBD).
The two magic numbers would be used for seeking, identifying and processing the two data block types differently.
My question is: Is there a magic number I can pick for the second block type, which would very unlikely (or better, impossible) to be found inside a bzip2-compressed block of bytes?
In other words, are there particular bytes that bzip2 excludes or would be probabilistically unlikely to use when compressing, within some statistical threshold, which I could use for a header for another data type in the same file?
One option is that, when I find header bytes for a second block type, I would simply try to process data in the second block type, and if that processing fails, then I assume I am accidentally inside a compressed bzip2 block. But I'd like to know if there is the possibility that there are bytes that would not be found in a bzip2 block, or would not be likely to be found.