2

I have a process in Azure that generates a large number of pdf report files and stores them in blob storage. Rather than send links to all these individually, I'm generating a zip file and sending this link to users.

This process is all done in a single process, and has been working fine. Lately, I've been getting OutOfMemory exception errors when adding files to the zip archive and I'm struggling to find a solution.

Below is the code I use to create the zip file (note: using the SharpLibZip library). Currently, it fails with an OutOfMemoryException after adding about 45 files of about 3.5Mb per file (PDF). The failure occurs when I hit the line: zipStream.PutNextEntry(newEntry).

Does anyone know how I could improve this process? It seems to small a zip file to fail at this level.

Using outputMemStream As New MemoryStream()

    Using zipStream As New ICSharpCode.SharpZipLib.Zip.ZipOutputStream(outputMemStream)
          zipStream.SetLevel(7)

          Dim collD3 As UserSurveyReportCollection = GetFileList(RequestID)

          For Each entityD2 As UserSurveyReport In collD3

              Try
                  Dim strF As String = entityD2.FileLocation

                 'Download blob as memorystream and add this stream to the zip file
                 Dim msR As New MemoryStream 
                 msR = objA.DownloadBlobAsMemoryStream(azureAccount, ReportFolder, entityD2.FileName)
                 msR.Seek(0, SeekOrigin.Begin)

                'Determine file name used in zip file archive for item
                 Dim strZipFileName As String = DetermineZipSourceName(entityD2, strFolder, strFileName)

                 'Add MemoryStream to ZipFile Stream
                 Dim newEntry As ICSharpCode.SharpZipLib.Zip.ZipEntry = New ICSharpCode.SharpZipLib.Zip.ZipEntry(strZipFileName)
                 newEntry.DateTime = DateTime.Now

                 zipStream.PutNextEntry(newEntry)
                 msR.CopyTo(zipStream)
                 zipStream.CloseEntry()

                 msR = Nothing
                 zipStream.Flush()

                 intCounter += 1

        End If

    Catch exZip As Exception

    End Try

  Next


    zipStream.IsStreamOwner = False
    zipStream.Finish()
    zipStream.Close()

    outputMemStream.Position = 0

    Dim bytes As Byte() = outputMemStream.ToArray()
    result.Comment = objA.UploadBlob(bytes, azureAccount, ReportFolder, entityReport.FileName).AbsolutePath


    End Using
  End Using
DaveA
  • 187
  • 1
  • 13
  • Just an observation but the error you are having seems to have nothing to do with Azure and more to do with the zip library. Or am I reading this wrong? Are you looking for a way to get a similar result without having to use that zip library? – KWilson Jun 04 '18 at 14:58
  • I think you are probably right @KWilson - this is about creating a zip archive from a largish number of files without touching the disk. Then persisting the result to blob storage. I'm not wedded to any particular zip library so if you have other suggestions, I'm all ears :) – DaveA Jun 04 '18 at 21:50

2 Answers2

5

For anyone who deals in C# and wants to write a large zip file to blob storage:

var blob = container.GetBlockBlobReference(outputFilename);
using (var stream = await blob.OpenWriteAsync())
using (var zip = new ZipArchive(stream, ZipArchiveMode.Create))
{
    for (int i = 0; i < 2000; i++)
    {
        using (var randomStream = CreateRandomStream(2))
        {
            var entry = zip.CreateEntry($"{i}.zip", CompressionLevel.Optimal);
            using (var innerFile = entry.Open())
            {
                await randomStream.CopyToAsync(innerFile);
            }
        }
    }
}

This works surprisingly well. App memory about 20Mb with very low CPU as it streams to Azure. I've created very large output files (> 4.5Gb) with no problem

Carl
  • 1,782
  • 17
  • 24
  • Thanks Carl. Will give it a crack! – DaveA Feb 19 '19 at 20:24
  • 1
    Very old post, I know, but would this approach work to append to an existing zip blob file? I suspect this might have to download the existing blob entirely first, which certainly ruins the approach. – TheDoc Sep 20 '19 at 15:41
  • I think you'll need to give it a try and report back! But I would tend to agree with your assessment. – Carl Sep 20 '19 at 20:21
1

I found a solution. This approach seems to minimise the memory usage of in-memory zip file creation, and loads the resulting zip archive to blob storage in Azure. This uses the native System.IO.Compression library rather than a 3rd party zip library.

I created a class called ZipModel which just has a file name and blob. I create a list of these, and pass it into the function below. I hope this helps somebody else in the same predicament.

    Private Function SendBlobsToZipFile(ByVal destinationBlob As CloudBlockBlob, ByVal sourceBlobs As List(Of ZipModel)) As ResultDetail

    Dim result As Boolean = True
    Dim resultCounter as Integer = 0

    Using blobWriteStream As Stream = destinationBlob.OpenWrite()

        Using archive As ZipArchive = New ZipArchive(blobWriteStream, ZipArchiveMode.Create)

            For Each zipM As ZipModel In sourceBlobs
                Try
                    Dim strName As String = String.Format("{0}\{1}", zipM.FolderName, zipM.FileName)
                    Dim archiveEntry As ZipArchiveEntry = archive.CreateEntry(strName, CompressionLevel.Optimal)

                    Using archiveWriteStream As Stream = archiveEntry.Open()
                        zipM.ZipBlob.DownloadToStream(archiveWriteStream)
                        resultCounter  += 1
                    End Using
                Catch ex As Exception

                    result = False

                End Try

            Next

        End Using
    End Using

    Return result


End Function
DaveA
  • 187
  • 1
  • 13