3

I am having trouble programmatically unzipping a 3GB (7GB uncompressed) gzip file using the built in .net 4.0 Gzip and Deflate Classes.

My understanding is that they should both support files over 4GB, but they seem to not be working. When I manually unzip the file in question with WinRAR and then stream through the underlying csv file with a stream reader and count lines I get the expected result, about 75million lines. However, when doing this using GzipStream or DeflateStream the stream reader stops a little more than half way through (right around the 4GB mark) and reports "end of stream" and ends without an error. Using those readers I only get to about line 34million before the stream ends.

I then tried the latest binary of .net zip http://dotnetzip.codeplex.com/ and it gets halfway through and throws an error. "Destination array was not long enough. Check destIndex and length, and the array's lower bounds."

I didn't create this file, but we have never had trouble with smaller files from the same source before, so I suspect something about the size is causing the problem. It could be that the tools used to create this file aren't 64bit compliant, but before I go bug the creators I want to make sure the bug isn't on our side in the gzip extractor logic.

Any thoughts would be greatly appreciated. Example extraction code and testing methodology below:

var msGZ = 0;//gives 34million
var fileName = @"C:\MyFile.csv.gz";
using (System.IO.Stream input = System.IO.File.OpenRead(filename))
using (var gz = new GZipStream(stream, CompressionMode.Decompress))
using (var r = new StreamReader(gz))
{
    while (!r.EndOfStream)
    {
        r.ReadLine();
        msGZ++;
    }
}



var msDF = 0;  //gives 34million
using (System.IO.Stream input = System.IO.File.OpenRead(filename))
using (var df = new DeflateStream(stream, CompressionMode.Decompress))
using (var r = new StreamReader(df))
{
    while (!r.EndOfStream)
    {
        r.ReadLine();
        msDF++;
    }
}



var csvCount = 0;//roughly 75million lines

using (var ms = System.IO.File.OpenRead("UncompressedBYWinRAR.csv"))
{
    var r = new StreamReader(ms);
    while (!r.EndOfStream)
    {
        r.ReadLine();
        csvCount++;
    }
}




var zipNet = 0;

//Zip.Net throws this error half way through at around line 34million
//"Destination array was not long enough. Check destIndex and length, and the array's lower bounds."

using (System.IO.Stream input = System.IO.File.OpenRead(filename))
using (Stream decompressor = new Ionic.Zlib.GZipStream(input, Ionic.Zlib.CompressionMode.Decompress, true))
using (var r = new StreamReader(decompressor))
{
    while (!r.EndOfStream)
    {
        r.ReadLine();
        zipNet++;
    }
}
Glenn
  • 1,234
  • 2
  • 19
  • 33
  • 2
    Look similar to [this stackoverflow post](http://stackoverflow.com/questions/505190/net-deflatestream-4gb-limit). Maybe it will help you? – Marcin Deptuła Apr 01 '11 at 15:22
  • I think a MS Guy actually commented on that post that they have removed the limit. That is where my assumption that it should work comes from: http://stackoverflow.com/questions/505190/net-deflatestream-4gb-limit/505887#505887 – Glenn Apr 01 '11 at 15:28
  • 1
    The .NET v2/3.5 compression classes look like they had a 4GB limitation, but that remark was removed for .NET 4. Could it possibly be using the wrong class files? Can you try doing something gratuitously .NET v4? :-) – Ken Apr 01 '11 at 15:32
  • 1
    I've read that, then I've checked [msdn site](http://msdn.microsoft.com/en-us/library/bc2dbwea.aspx) (.net 4.0 version) and it's says that compressing files larger then 4GB still throws an exception. There's nothing about uncompressing though. Maybe it's a bug :-) – Marcin Deptuła Apr 01 '11 at 15:34
  • 1
    Post bug reports with support documents to connect.microsoft.com – Hans Passant Apr 01 '11 at 15:35
  • I will run some more tests and report back. Can any of you recommend a open source alternative? .net zip was recommended but it fails on files this size. – Glenn Apr 01 '11 at 16:02

1 Answers1

0

Use of GZipInputStream from SharpZip instead of Io.GZipStream provided the solution.

mohit bansal
  • 331
  • 3
  • 7