36

I was in need of a way to compress images in .net so i looked into using the .net GZipStream class (or DeflateStream). However i found that decompression was not always successful, sometimes the images would decompress fine and other times i would get a GDI+ error that something is corrupted.

After investigating the issue i found that the decompression was not giving back all the bytes it compressed. So if i compressed 2257974 bytes i would sometimes get back only 2257870 bytes (real numbers).

The most funny thing is that sometimes it would work. So i created this little test method that compresses only 10 bytes and now i don't get back anything at all.

I tried it with both compression classes GZipStream and DeflateStream and i double checked my code for possible errors. I even tried positioning the stream to 0 and flushing all the streams but with no luck.

Here is my code:

    public static void TestCompression()
    {
        byte[] test = new byte[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };

        byte[] result = Decompress(Compress(test));

        // This will fail, result.Length is 0
        Debug.Assert(result.Length == test.Length);
    }

    public static byte[] Compress(byte[] data)
    {
        var compressedStream = new MemoryStream();
        var zipStream = new GZipStream(compressedStream, CompressionMode.Compress);
        zipStream.Write(data, 0, data.Length);
        return compressedStream.ToArray();
    }

    public static byte[] Decompress(byte[] data)
    {
        var compressedStream = new MemoryStream(data);
        var zipStream = new GZipStream(compressedStream, CompressionMode.Decompress);
        var resultStream = new MemoryStream();

        var buffer = new byte[4096];
        int read;

        while ((read = zipStream.Read(buffer, 0, buffer.Length)) > 0) {
            resultStream.Write(buffer, 0, read);
        }

        return resultStream.ToArray();
    }
Bobby Z
  • 513
  • 1
  • 6
  • 11
  • Re your comment - it comes down to buffers at different levels; if they aren't all emptied (in the right order) then you don't get all the data. – Marc Gravell Nov 07 '08 at 04:59
  • Note, for example, that I didn't bother calling Close() on the MemoryStream itself - so I partly agree ;-p – Marc Gravell Nov 07 '08 at 05:00
  • I'll add an update on this, too.... – Marc Gravell Nov 07 '08 at 05:02
  • Totally right. (At one point i tried using Close() on the memory stream but the ToArray() method was complaining about it, which would mean i have to create a new buffer, empty the stream on it, close the stream, return the new buffer. Just to much work) – Bobby Z Nov 07 '08 at 14:48
  • 1
    I forgot to say: this was an excellently phrased question, in that the code sample made it very easy to show that a: it was broken, and b: when it was fixed. Three cheers. – Marc Gravell Nov 07 '08 at 20:55

2 Answers2

50

You need to Close() the ZipStream after adding all the data you want to compress; it retains a buffer of unwritten bytes internally (even if you Flush()) that needs to be written.

More generally, Stream is IDisposable, so you should also be using each... (yes, I know that MemoryStream isn't going to lose any data, but if you don't get into this habit, it will bite you with other Streams).

public static byte[] Compress(byte[] data)
{
    using (var compressedStream = new MemoryStream())
    using (var zipStream = new GZipStream(compressedStream, CompressionMode.Compress))
    {
        zipStream.Write(data, 0, data.Length);
        zipStream.Close();
        return compressedStream.ToArray();
    }
}

public static byte[] Decompress(byte[] data)
{
    using(var compressedStream = new MemoryStream(data))
    using(var zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
    using (var resultStream = new MemoryStream())
    { ... }
}

[edit : updated re comment] Re not using things like MemoryStream - this is always a fun one, with lots of votes on either side of the fence: but ultimatey...

(rhetorical - we all know the answer...) How is MemoryStream implemented? is it a byte[] (owned by .NET)? is it a memory-mapped file (owned by the OS)?

The reason you aren't using it is because you are letting knowledge of internal implementation details change how you code against a public API - i.e. you just broke the laws of encapsulation. The public API says: I am IDisposable; you "own" me; therefore, it is your job to Dispose() me when you are through.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Wow, It worked like a charm. It's interesting because i didn't think that Close() is necessary for internal memory since there is no windows resource involved (the same goes for the using block -- it was just cleaner without them) – Bobby Z Nov 07 '08 at 04:53
  • 1
    Close() isn't about freeing a Windows resource here. GZip requires a footer on the end of the data, and Close() tells GZipStream that you've finished writing data and it should write out the footer. – stevemegson Nov 07 '08 at 09:18
  • You might be correct about letting implementation details define how i right code. But lately Microsoft has been implementing to much the dispose pattern (even on none disposable objects) – Bobby Z Nov 07 '08 at 14:52
  • And as we all know dispose encapsulates everything in a try/catch/finally block. In this compression method which only uses memory i thought i can use the extra performance. – Bobby Z Nov 07 '08 at 14:53
  • BTW the new MVC framework has a Html.Form() method that can be used in a using block to automatically close the html form. (They are basically using the convenience of the using block which automatically calls some method after some operation eg closing the form after rendering the entire page) – Bobby Z Nov 07 '08 at 14:56
  • 4
    "i thought i can use the extra performance" - profile first; to be honest, I wouldn't expect an extra try/finally to make a whole heap of difference – Marc Gravell Nov 07 '08 at 20:26
  • +1 helpful, informative answer, with clear code plus response to questions - helped me when learning about compression streams – J M Mar 20 '09 at 19:03
3

Also - keep in mind the DeflateStream in System.IO.Compression does not implement the most efficient deflate algorithm. If you like, there is an alternative to the BCL GZipStream and DeflateStream; it is implemented in a fully-managed library based on zlib code, that performs better than the built-in {Deflate,GZip}Stream in this respect. [ But you still need to Close() the stream to get the full bytestream. ]

These stream classes are shipped in the DotNetZlib assembly, available in the DotNetZip distribution at http://DotNetZip.codeplex.com/.

Cheeso
  • 189,189
  • 101
  • 473
  • 713
  • 1
    You'll be pleased to hear that .NET 4.5 has woven in the zlib algorithm into the BCL now (with backwards compatibility for existing compressed data). See here for more info: http://msdn.microsoft.com/en-us/magazine/jj133817.aspx – pattermeister Feb 07 '13 at 17:03
  • 1
    Terrific! It took too long, but I'm glad it's finally there! – Cheeso Feb 10 '13 at 19:18