1

I’m writing a console app to compress a directory of large files (around 30) with each file coming in at around 100-300 MB, which will be done once per day (as new files come in). I’ve tried using the built in GZipStream class and it took about 15 seconds per file with a compression ratio of about 0.212. I was wondering if there is a more efficient way out there with 3rd party libraries or if there's some way to increase the compression ratio. Finally, is threading an option to speed this process up?

Here's the code Im currently using (basically its from the MSDN article on GZipStream)

private void CompressFile(FileInfo fileInfo)
{
    // Get the stream of the source file.
    using (FileStream inFile = fileInfo.OpenRead())
    {
        Timer.Reset();

        // Prevent compressing hidden and 
        // already compressed files.
        if ((File.GetAttributes(fileInfo.FullName) & FileAttributes.Hidden) != FileAttributes.Hidden & fileInfo.Extension != ".gz")
        {
            // Create the compressed file.
            using (FileStream outFile = File.Create(fileInfo.FullName + ".gz"))
            {
                using (GZipStream Compress = new GZipStream(outFile, CompressionMode.Compress))
                {
                    // Copy the source file into 
                    // the compression stream.
                    Timer.Start();
                    inFile.CopyTo(Compress);
                    Timer.Stop();

                    Console.WriteLine("Compressed {0} from {1} to {2} bytes in {3} seconds.",
                        fileInfo.Name, fileInfo.Length.ToString(), outFile.Length.ToString(), ((double)Timer.ElapsedMilliseconds / 1000));
                }
            }
        }
    }
}

Thanks!

Hershizer33
  • 1,206
  • 2
  • 23
  • 46
  • Is there any known overlap between the files, anything else you can use to logically compress content further? – BrokenGlass Nov 11 '11 at 13:44
  • 4
    Just let Windows do this. Right-click the folder, Properties, Advanced, tick "Compress contents" option. – Hans Passant Nov 11 '11 at 13:46
  • @HansPassant So the potential preformance gains of programming it are not worth the hassel? (Also, I assume theres a way to schedule windows to do it?) – Hershizer33 Nov 11 '11 at 13:51
  • 1
    Windows can do this a *lot* more efficiently, it happens on a background kernel worker thread. No scheduling required, you turn it on and it will always be active. File data is compressed when it is written to disk. And automatically decompressed when you read them. – Hans Passant Nov 11 '11 at 14:01

2 Answers2

2

This answer: Is it safe to call ICsharpCode.SharpZipLib in parallel on multiple threads

gives some comparisons of GZIP compression alternatives.

Your data is large enough that you could benefit from doing compression in parallel.

This sample code does the parallel compression.

As compared to the builtin GZipStream, the parallel approach takes about half the time and renders "a little better" compression.

DotNetZip also has classes for BZip2 compression (including a parallel implementation). BZip2 is much slower than GZIP, but gives you a better compression ratio.

Community
  • 1
  • 1
Cheeso
  • 189,189
  • 101
  • 473
  • 713
  • Ended up doing parrallel compression for my code, but might just do Hans' suggestion of letting windows handle it. – Hershizer33 Nov 11 '11 at 14:38
1

There is no generic way. You need to profile it for the

  • payload
  • file system
  • CPU load and capacity

You could pass the Level parameter to the GZipStream Constructor

I'd consider using pre-existing (external) tools to do the job. You'll be much quicker with comparison benchmarks, because you don't have to go and implement them. I'd really suggest the unix like tools but you might have trouble finding them for your Windows platform

sehe
  • 374,641
  • 47
  • 450
  • 633
  • Regarding "you might have trouble finding..." There's a tar for Windows that does GZIP compression at http://cheeso.members.winisp.net/srcview.aspx?dir=Tar . There's a parallel bzip library included in DotNetZip. – Cheeso Nov 11 '11 at 13:36
  • Note that I linked to several tools there too. – sehe Nov 11 '11 at 13:38