5

I am trying to compress JSON files using Gzip compression to be sent to another location. It needs to process 5,000 - 10,000 files daily, and I don't need the compressed version of the file on the local machine (they are actually being transferred to AWS S3 for long-term archiving).

Since I don't need them, I am trying to compress to a memory stream and then use that to write to AWS, rather than compress each one to disk. Whenever I try to do this, the files are broken (as in, when I open them in 7-Zip and try to open the JSON file inside, I get "Data error File is Broken).

The same thing happens when I try to write the memory stream to a local file, so I'm trying to solve that for now. Here's the code:

string[] files = Directory.GetFiles(@"C:\JSON_Logs");

foreach(string file in files)
{
    FileInfo fileToCompress = new FileInfo(file);
    using (FileStream originalFileStream = fileToCompress.OpenRead())
    {
        using (MemoryStream compressedMemStream = new MemoryStream())
        {
            using (GZipStream compressionStream = new GZipStream(compressedMemStream, CompressionMode.Compress))
            {
                originalFileStream.CopyTo(compressionStream);
                compressedMemStream.Seek(0, SeekOrigin.Begin);
                FileStream compressedFileStream = File.Create(fileToCompress.FullName + ".gz");

                //Eventually this will be the AWS transfer, but that's not important here
                compressedMemStream.WriteTo(compressedFileStream); 
            }
        }
    }      
}
weston
  • 54,145
  • 21
  • 145
  • 203
smullan
  • 132
  • 3
  • 9
  • It has made a .gz file. 7-Zip will read a .gz file usually. I've tried doing it with a filestream in place of the memorystream and it works fine. – smullan May 09 '16 at 14:31

1 Answers1

6

Rearrange your using statements so the GZipStream is definitely done by the time you read the memory stream contents:

foreach(string file in files)
{
    FileInfo fileToCompress = new FileInfo(file);
    using (MemoryStream compressedMemStream = new MemoryStream())
    {
        using (FileStream originalFileStream = fileToCompress.OpenRead())
        using (GZipStream compressionStream = new GZipStream(
            compressedMemStream, 
            CompressionMode.Compress,
            leaveOpen: true))
        {
            originalFileStream.CopyTo(compressionStream);
        }
        compressedMemStream.Seek(0, SeekOrigin.Begin);

        FileStream compressedFileStream = File.Create(fileToCompress.FullName + ".gz");
        //Eventually this will be the AWS transfer, but that's not important here
        compressedMemStream.WriteTo(compressedFileStream); 
    }
}

Disposing a stream takes care of flushing and closing it.

Jeroen Mostert
  • 27,176
  • 2
  • 52
  • 85
  • Didn't noticed the arrange mistake.. +1 – Jeroen van Langen May 09 '16 at 14:30
  • Then I get a system exception "cannot access closed stream" on the line compressedMemStream.WriteTo(compressedFileStream); – smullan May 09 '16 at 14:36
  • @smullan: Whoops. `GZipStream` thinks it owns the stream passed to it, which means it closes out the `MemoryStream` from under us. Rude. We tell it not to do that by passing the `leaveOpen` parameter. (We could also fix this by rearranging more code, because a `MemoryStream` can be read after closing it through its array, but I think this is more elegant.) – Jeroen Mostert May 09 '16 at 14:43