I am having trouble programmatically unzipping a 3GB (7GB uncompressed) gzip file using the built in .net 4.0 Gzip and Deflate Classes.
My understanding is that they should both support files over 4GB, but they seem to not be working. When I manually unzip the file in question with WinRAR and then stream through the underlying csv file with a stream reader and count lines I get the expected result, about 75million lines. However, when doing this using GzipStream or DeflateStream the stream reader stops a little more than half way through (right around the 4GB mark) and reports "end of stream" and ends without an error. Using those readers I only get to about line 34million before the stream ends.
I then tried the latest binary of .net zip http://dotnetzip.codeplex.com/ and it gets halfway through and throws an error. "Destination array was not long enough. Check destIndex and length, and the array's lower bounds."
I didn't create this file, but we have never had trouble with smaller files from the same source before, so I suspect something about the size is causing the problem. It could be that the tools used to create this file aren't 64bit compliant, but before I go bug the creators I want to make sure the bug isn't on our side in the gzip extractor logic.
Any thoughts would be greatly appreciated. Example extraction code and testing methodology below:
var msGZ = 0;//gives 34million
var fileName = @"C:\MyFile.csv.gz";
using (System.IO.Stream input = System.IO.File.OpenRead(filename))
using (var gz = new GZipStream(stream, CompressionMode.Decompress))
using (var r = new StreamReader(gz))
{
while (!r.EndOfStream)
{
r.ReadLine();
msGZ++;
}
}
var msDF = 0; //gives 34million
using (System.IO.Stream input = System.IO.File.OpenRead(filename))
using (var df = new DeflateStream(stream, CompressionMode.Decompress))
using (var r = new StreamReader(df))
{
while (!r.EndOfStream)
{
r.ReadLine();
msDF++;
}
}
var csvCount = 0;//roughly 75million lines
using (var ms = System.IO.File.OpenRead("UncompressedBYWinRAR.csv"))
{
var r = new StreamReader(ms);
while (!r.EndOfStream)
{
r.ReadLine();
csvCount++;
}
}
var zipNet = 0;
//Zip.Net throws this error half way through at around line 34million
//"Destination array was not long enough. Check destIndex and length, and the array's lower bounds."
using (System.IO.Stream input = System.IO.File.OpenRead(filename))
using (Stream decompressor = new Ionic.Zlib.GZipStream(input, Ionic.Zlib.CompressionMode.Decompress, true))
using (var r = new StreamReader(decompressor))
{
while (!r.EndOfStream)
{
r.ReadLine();
zipNet++;
}
}