Correct way to use GZipStream in dotNET C#

Question

I'm working with GZipStream at the moment using .net 3.5. I have two methods listed below. As input file I use text file which consists of chars 's'. Size of the file is 2MB. This code works fine if I use .net 4.5 but with .net 3.5 after compress and decompress I get file of size 435KB which of course isn't the same with source file. If I try to decompress file via WinRAR it is also looks good (the same with source file). If I try decompress file using GZipStream from .net4.5 (file compressed via GZipStream from .net 3.5) the result is bad.

UPD: In general I really need to read the file as several separate gzip chunks, in this case all the bytes of copressed files are read at one call of the Read() method so I still don't understand why decompressing doesn't works.

    public void CompressFile()
    {
        string fileIn = @"D:\sin2.txt";
        string fileOut = @"D:\sin2.txt.pgz";

        using (var fout = File.Create(fileOut))
        {
            using (var fin = File.OpenRead(fileIn))
            {
                using (var zip = new GZipStream(fout, CompressionMode.Compress))
                {
                    var buffer = new byte[1024 * 1024 * 10];
                    int n = fin.Read(buffer, 0, buffer.Length);
                    zip.Write(buffer, 0, n);
                }
            }
        }
    }
    public void DecompressFile()
    {
        string fileIn = @"D:\sin2.txt.pgz";
        string fileOut = @"D:\sin2.1.txt";

        using (var fsout = File.Create(fileOut))
        {
            using (var fsIn = File.OpenRead(fileIn))
            {
                var buffer = new byte[1024 * 1024 * 10];
                int n;
                while ((n = fsIn.Read(buffer, 0, buffer.Length)) > 0)
                {
                    using (var ms = new MemoryStream(buffer, 0, n))
                    {
                        using (var zip = new GZipStream(ms, CompressionMode.Decompress))
                        {
                            int nRead = zip.Read(buffer, 0, buffer.Length);
                            fsout.Write(buffer, 0, nRead);
                        }
                    }
                }
            }
        }
    }

score 3 · Accepted Answer · answered Jan 03 '16 at 08:35

3

You're trying to decompress each "chunk" as if it's a separate gzip file. Don't do that - just read from the GZipStream in a loop:

using (var fsout = File.Create(fileOut))
{
    using (var fsIn = File.OpenRead(fileIn))
    {
        using (var zip = new GZipStream(fsIn, CompressionMode.Decompress))
        {
            var buffer = new byte[1024 * 32];
            int bytesRead;

            while ((bytesRead = zip.Read(buffer, 0, buffer.Length)) > 0)
            {
                fsout.Write(buffer, 0, bytesRead);
            }
        }
    }
}

Note that your compression code should look similar, reading in a loop rather than assuming a single call to Read will read all the data.

(Personally I'd skip fsIn, and just use new GZipStream(File.OpenRead(fileIn)) but that's just a personal preference.)

answered Jan 03 '16 at 08:35

Jon Skeet

1,421,763
867
9,128
9,194

Hmmm, your code works fine. But anyway in my code I use chunk which big enought (10MB) to read all bytes from the file at one time so dou you have any idea why it doesn't work? In general I need to read compressed file in separate chunks because I compress the source file by several threads and as a result I have a file with 10bytes header and sequence of compressed chunks with 4bytes prefix - length of the next chunk. – Leo Jan 03 '16 at 08:41
@Leo: You're still assuming that just because you've provided a buffer big enough, it *will* read everything in a single call. Streams don't work like that, in general - you shouldn't rely on a single call to `Read` reading the whole of the file. – Jon Skeet Jan 03 '16 at 09:04
Thanks I see so as far as I understand in order to read exactly 'count' bytes I need to do the folloing: 'int nRead = 0; while (nRead != count) { nRead += inputStream.Read(buffer, nRead, count - nRead); }' Am I right? – Leo Jan 03 '16 at 09:49
In which cases Read can return nRead less than count? (excluding EOF is reached). I guess that it might happen in case if file is downloading from the web but not sure about this, maybe there is possible other cases – Leo Jan 03 '16 at 11:03
@Leo: Any number of cases, really. Imagine if you're using FileStream but it's a network share... That could decide to give you some data. Decryption might be easier to implement if it only returned appropriate chunks. Basically, unless it's guaranteed not to it, don't assume it won't :) – Jon Skeet Jan 03 '16 at 13:08
OK, thanks! Now all I need works fine and I added check for Read() result. – Leo Jan 03 '16 at 13:21

score 3 · Answer 2 · answered Jan 03 '16 at 09:02

First, as @Jon Skeet mentioned, you are not using Stream.Read method correctly. It doesn't matter if your buffer is big enough or not, the stream is allowed to return less bytes than requested, with zero indicating no more, so reading from stream should always be performed in a loop.

However the main problem in your decompress code is the way you share the buffer. Your read the input into a buffer, than wrap it in a MemoryStream (note that the constructor used does not make a copy of the passed array, but actually sets it as it's internal buffer), and then you try to read and write to that buffer at the same time. Taking into account that decompressing writes data "faster" than reading, it's surprising that your code works at all.

The correct implementation is quite simple

static void CompressFile()
{
    string fileIn = @"D:\sin2.txt";
    string fileOut = @"D:\sin2.txt.pgz";
    using (var input = File.OpenRead(fileIn))
    using (var output = new GZipStream(File.Create(fileOut), CompressionMode.Compress))
        Write(input, output);
}

static void DecompressFile()
{
    string fileIn = @"D:\sin2.txt.pgz";
    string fileOut = @"D:\sin2.1.txt";
    using (var input = new GZipStream(File.OpenRead(fileIn), CompressionMode.Decompress))
    using (var output = File.Create(fileOut))
        Write(input, output);
}

static void Write(Stream input, Stream output, int bufferSize = 10 * 1024 * 1024)
{
    var buffer = new byte[bufferSize];
    for (int readCount; (readCount = input.Read(buffer, 0, buffer.Length)) > 0;)
        output.Write(buffer, 0, readCount);
}

Well spotted! I hadn't even noticed that bit... Note that in .NET 4, `Stream` gained a `CopyTo` like your `Write` method. I've personally used an extension method (also called `CopyTo`) to do the same thing... although I wouldn't choose 10MB as a default buffer size. (I'd also strongly encourage using braces for using directives, loops etc... I've seen too many mistakes caused by the lack of them.) — Jon Skeet, Jan 03 '16 at 09:06
@JonSkeet Your are right of course. I've used a custom name and the default size 10MB just to match the OP code. What about braces, I don't use them for single statements, but I understand what are you talking about :) — Ivan Stoev, Jan 03 '16 at 09:33
Thanks, my first mistake here: "than wrap it in a MemoryStream (note that the constructor used does not make a copy of the passed array, but actually sets it as it's internal buffer)" adding new buffer for copy bytes biteen streams solved the problem And the second mistake, as @JonSkeet said, related to using of streams from MSDN "The implementation will block until at least one byte of data can be read, in the event that no data is available" Thanks a lot I will correct my mistakes and then hope the program'll work fine — Leo, Jan 03 '16 at 09:38
Sorry wrong quotatio I meant "_An implementation is free to return fewer bytes than requested even if the end of the stream has not been reached._" — Leo, Jan 03 '16 at 09:57

Correct way to use GZipStream in dotNET C#

2 Answers2