6

I am attempting to compress a large amount of data, sometimes in the region of 100GB, when i run the routine i have written it appears the file comes out exactly the same size as the previous size. Has anyone else had this issue with the GZipStream?

My code is as follows:

        byte[] buffer = BitConverter.GetBytes(StreamSize);
        FileStream LocalUnCompressedFS = File.OpenWrite(ldiFileName);
        LocalUnCompressedFS.Write(buffer, 0, buffer.Length);
        GZipStream LocalFS = new GZipStream(LocalUnCompressedFS, CompressionMode.Compress);
        buffer = new byte[WriteBlock];
        UInt64 WrittenBytes = 0;
        while (WrittenBytes + WriteBlock < StreamSize)
        {
            fromStream.Read(buffer, 0, (int)WriteBlock);
            LocalFS.Write(buffer, 0, (int)WriteBlock);
            WrittenBytes += WriteBlock;
            OnLDIFileProgress(WrittenBytes, StreamSize);
            if (Cancel)
                break;
        }
        if (!Cancel)
        {
            double bytesleft = StreamSize - WrittenBytes;
            fromStream.Read(buffer, 0, (int)bytesleft);
            LocalFS.Write(buffer, 0, (int)bytesleft);
            WrittenBytes += (uint)bytesleft;
            OnLDIFileProgress(WrittenBytes, StreamSize);
        }
        LocalFS.Close();
        fromStream.Close();

The StreamSize is an 8 byte UInt64 value that holds the size of the file. i write these 8 bytes raw to the start of the file so i know the original file size. Writeblock has the value of 32kb (32768 bytes). fromStream is the stream to take data from, in this instance, a FileStream. Is the 8 bytes infront of the compressed data going to cause an issue?

Skintkingle
  • 1,579
  • 3
  • 16
  • 28

2 Answers2

5

I ran a test using the following code for compression and it ran without issue on a 7GB and 12GB file (both known beforehand to compress "well"). Does this version work for you?

const string toCompress = @"input.file";
var buffer = new byte[1024*1024*64];

using(var compressing = new GZipStream(File.OpenWrite(@"output.gz"), CompressionMode.Compress))
using(var file = File.OpenRead(toCompress))
{
    var bytesRead = 0;
    while(bytesRead < buffer.Length)
    {
        bytesRead = file.Read(buffer, 0, buffer.Length);
        compressing.Write(buffer, 0, buffer.Length);
    }
}

Have you checked out the documentation?

The GZipStream class cannot decompress data that results in over 8 GB of uncompressed data.

You probably need to find a different library that will support your needs or attempt to break your data up into <=8GB chunks that can safely be "sewn" back together.

Austin Salonen
  • 49,173
  • 15
  • 109
  • 139
  • 2
    Hi Austin, thanks for the answer. My program will not be decompressing so i dont think this matters? unless it's an 8gb limit to compression too. – Skintkingle May 16 '12 at 15:37
  • Hmm... what if you need more than that? Are there other options available? It seems strange that a stream would have that sort of limitation. – Jeremy Holovacs May 16 '12 at 15:38
  • That's talking about decompression, the OP is talking about compression. – Igby Largeman May 16 '12 at 15:39
  • @Skintkingle: How are you testing the validity of your code now? – Austin Salonen May 16 '12 at 15:41
  • Hi Austin, I have not confirmed the code compression works on smaller data sets although i cannot see that code snippet 'not' working. One second let me set StreamSize to something small so it only takes a portion of the data and see if the compressed size is smaller. – Skintkingle May 16 '12 at 16:18
  • I will try your new solution tomorrow when I'm back at work. :) – Skintkingle May 16 '12 at 20:30
  • Curiously, When putting the large data into 6gb data chunks and writing to file compression happened. It looks like there may be a performance hit on Compression aswell as decompression. I will mark your answer as the solution as it mentions an 8gb limit aswell. – Skintkingle May 22 '12 at 10:42
1

Austin Salonen's code doesn't work for me (buggy, 4GB error).

Here's the proper way:

using System;
using System.Collections.Generic;
using System.Text;

namespace CompressFile
{
    class Program
    {


        static void Main(string[] args)
        {
            string FileToCompress = @"D:\Program Files (x86)\msvc\wkhtmltopdf64\bin\wkhtmltox64.dll";
            FileToCompress = @"D:\Program Files (x86)\msvc\wkhtmltopdf32\bin\wkhtmltox32.dll";
            string CompressedFile = System.IO.Path.Combine(
                 System.IO.Path.GetDirectoryName(FileToCompress)
                ,System.IO.Path.GetFileName(FileToCompress) + ".gz"
            );


            CompressFile(FileToCompress, CompressedFile);
            // CompressFile_AllInOne(FileToCompress, CompressedFile);

            Console.WriteLine(Environment.NewLine);
            Console.WriteLine(" --- Press any key to continue --- ");
            Console.ReadKey();
        } // End Sub Main


        public static void CompressFile(string FileToCompress, string CompressedFile)
        {
            //byte[] buffer = new byte[1024 * 1024 * 64];
            byte[] buffer = new byte[1024 * 1024]; // 1MB

            using (System.IO.FileStream sourceFile = System.IO.File.OpenRead(FileToCompress))
            {

                using (System.IO.FileStream destinationFile = System.IO.File.Create(CompressedFile))
                {

                    using (System.IO.Compression.GZipStream output = new System.IO.Compression.GZipStream(destinationFile,
                        System.IO.Compression.CompressionMode.Compress))
                    {
                        int bytesRead = 0;
                        while (bytesRead < sourceFile.Length)
                        {
                            int ReadLength = sourceFile.Read(buffer, 0, buffer.Length);
                            output.Write(buffer, 0, ReadLength);
                            output.Flush();
                            bytesRead += ReadLength;
                        } // Whend

                        destinationFile.Flush();
                    } // End Using System.IO.Compression.GZipStream output

                    destinationFile.Close();
                } // End Using System.IO.FileStream destinationFile 

                // Close the files.
                sourceFile.Close();
            } // End Using System.IO.FileStream sourceFile

        } // End Sub CompressFile


        public static void CompressFile_AllInOne(string FileToCompress, string CompressedFile)
        {
            using (System.IO.FileStream sourceFile = System.IO.File.OpenRead(FileToCompress))
            {
                using (System.IO.FileStream destinationFile = System.IO.File.Create(CompressedFile))
                {

                    byte[] buffer = new byte[sourceFile.Length];
                    sourceFile.Read(buffer, 0, buffer.Length);

                    using (System.IO.Compression.GZipStream output = new System.IO.Compression.GZipStream(destinationFile,
                        System.IO.Compression.CompressionMode.Compress))
                    {
                        output.Write(buffer, 0, buffer.Length);
                        output.Flush();
                        destinationFile.Flush();
                    } // End Using System.IO.Compression.GZipStream output

                    // Close the files.        
                    destinationFile.Close();
                } // End Using System.IO.FileStream destinationFile 

                sourceFile.Close();
            } // End Using System.IO.FileStream sourceFile

        } // End Sub CompressFile


    } // End Class Program


} // End Namespace CompressFile
Stefan Steiger
  • 78,642
  • 66
  • 377
  • 442