According to the page you reference (I'm not familiar with this file format myself), each block of data is indexed by an offset field in the index for the file. Since you know the length of the type and data length fields that precedes each data block, and you know the offset of the next block, you also know the length of each data block (i.e. the length of the compressed bytes).
That is, the length of each data block is simply the offset of the next block minus the offset of the current block, then minus the length of the type and data length fields (however many bytes that is…according to the documentation, it's variable, but you can certainly compute that length as you read it).
So:
1) I only know the size of the decompressed objects. The Read method documentation on the DeflateStream states that it "Reads a number of decompressed bytes into the specified byte array." Which is what I want, however I do see people setting this count to the size of the compressed data, one of us is doing it wrong.
The documentation is correct. DeflateStream
is a subclass of Stream
, and has to follow that class's rules. Since the Read()
method of Stream
outputs the number of bytes requested, these must be uncompressed bytes.
Note that per the above, you do know the size of the compressed objects. It's not stored in the file, but you can derive that information from the things that are stored in the file.
2) The data I'm getting back is correct, I think (human-readable data that looks right), however it's advancing the underlying stream I give it all the way to the end! For example I ask it for 187 decompressed bytes and reads the remaining 212 bytes all the way to the end of the stream. As in the whole stream is 228 bytes and the position of the stream at the end of the deflate read 187 bytes is now 228. I can't seek backwards, as I don't know where the end of the compressed data is, and also not all the streams I use would be seekable. Is this the expected behavior to consume the whole stream?
Yes, I would expect that to happen. Or at a minimum, I would expect some buffering to happen, so even if it didn't read all the way to the end of the stream, I would expect it to read at least some number of bytes past the end of the compressed data.
It seems to me that you have at least a couple of options:
- For each block of data, compute the length of the data (per above), read that into a standalone
MemoryStream
object, and decompress the data from that stream rather than the original.
- Alternatively, go ahead and decompress directly from the source stream, using the offsets provided in the index to seek to each data block as you read it. Of course, this won't work with non-seekable streams, which you indicate occur in your scenario. So this option would not work for all cases in your scenario.