Some other questions here have been about the issue of being able to compress only a part/chunk of a large file of compressed data. Allowing some sort of "random access decompression". Bzip2 has always been among the recommendations for such a feature.
Reading about bzip on Wikipedia and on some document refered to as the informal specification it was not completely clear at what level this feature to separately decompress a part of the bzip2 file occurs. There seems to be two options, a) it is on the level of BzipStream
s and b) it is even on the level of StreamBlock
s (of which to my understading there can be one or more inside of a BzipStream
).
BZipFile:=BZipStream+ └──BZipStream:=StreamHeader StreamBlock* StreamFooter ├──StreamHeader:=HeaderMagic Version Level ├──StreamBlock:=BlockHeader BlockTrees BlockData │ ├──BlockHeader:=BlockMagicBlockCRC Randomized OrigPtr │ └──BlockTrees:=SymMapNumTrees NumSels Selectors Trees │ ├──SymMap:=MapL1 MapL2{1,16} │ ├──Selectors:=Selector{NumSels} │ └──Trees:=(BitLen Delta{NumSyms}{NumTrees} └──StreamFooter:=FooterMagic StreamCRCPadding
Albeit the bzip2 is praised often, it seems to me the fact of having the archive data not being byte-aligned, but bit-aligned within each BzipStream, whould suggest that the separate decompression of individual blocks has not been something that was supposed to happen, though I cannot be sure and hence this question :)
Update
A look onto the man bzip2recover
manual page tells
bzip2 compresses files in blocks, usually 900kbytes long. Each block is handled independently. If a media or transmission error causes a multi-block .bz2 file to become damaged, it may be possible to recover data from the undamaged blocks in the file.
The compressed representation of each block is delimited by a 48-bit pattern, which makes it possible to find the block boundaries with rea‐ sonable certainty. Each block also carries its own 32-bit CRC, so dam‐ aged blocks can be distinguished from undamaged ones.
which might strongly suggest that each block can be decompressed separately. Is this correct?