2

I have a bunch of data in a byte[], I compress it using a GZipStream like this.

byte[] input = ...;

var zipped = new MemoryStream();
using (var zipper = new GZipStream(zipped, CompressionMode.Compress, true)) {
  zipper.Write(input, 0, input.Length);
}

Due to my technical requirements I need to split the result into - let's say - 50k chunks, so that each chunk can be decompressed and restores a corresponding chunk of the original data.

If I just split the result byte[], the chunks won't form a valid GZip archive any more so that's not a good way.

I neither can use some kind of loop to stop zipping at a chunk size because GZipStream cannot report the current length of the zipped data unfortunately. I only get the Length when I close the zipping stream, but then I already have a valid archive so I cannot just continue from there.

How could I do this while keeping each chunk as a valid GZip archive?

Zoltán Tamási
  • 12,249
  • 8
  • 65
  • 93
  • https://unix.stackexchange.com/a/359316 may be worth a read. – mjwills Jul 16 '17 at 14:26
  • It might help if you tried writing the code to do this and then came back to ask about the parts you couldn't get to work. The SO help docs have a good explanation on how to write a question: https://stackoverflow.com/help – Joe Mayo Jul 17 '17 at 02:13

1 Answers1

3

There is not an efficient way to do this, since you cannot predict the size of the compressed output without compressing. (Unless you have no compression and some expansion with only stored blocks, but I'm assuming that you need compression.)

You can look at this example for how to get as much compressed data in a fixed block size as possible. It does three compression passes per block to do the fit. It does decompression of the compressed data twice to estimate the amount of uncompressed data that will fit, and recompressing that guess.

You can't assure that the compressed data will exactly fit your block size, since adding one uncompressed byte could add two compressed bytes, skipping right over your exact block size. However with the gzip format you can cheat and add junk bytes in the header to fill it out to the exact amount.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158