Creating List of object with byte array : OutOfMemoryException

Question

I have a .NET Core 1.1 Application that is having a problem when generating a List of objects that have a byte array in them. If there are more than 20 items in the list (arbitrary, I'm not sure of the exact number or size at which it fails) the method throws the OutOfMemoryException. The method is below:

public async Task<List<Blob>> GetBlobsAsync(string container)
    {
        List<Blob> retVal = new List<Blob>();
        Blob itrBlob;
        BlobContinuationToken continuationToken = null;
        BlobResultSegment resultSegment = null;

        CloudBlobContainer cont = _cbc.GetContainerReference(container);
        resultSegment = await cont.ListBlobsSegmentedAsync(String.Empty, true, BlobListingDetails.Metadata, null, continuationToken, null, null);
        do
        {
            foreach (var bItem in resultSegment.Results)
            {
                var iBlob = bItem as CloudBlockBlob;
                itrBlob = new Blob()
                {
                    Contents = new byte[iBlob.Properties.Length],
                    Name = iBlob.Name,
                    ContentType = iBlob.Properties.ContentType
                };

                await iBlob.DownloadToByteArrayAsync(itrBlob.Contents, 0);

                retVal.Add(itrBlob);
            }

            continuationToken = resultSegment.ContinuationToken;

        } while (continuationToken != null);

        return retVal;
    }

I'm not using anything that can really be disposed in the method. Is there a better way to accomplish this? The ultimate goal is to pull all of these files and then create a ZIP archive. This process works as long as I don't breach some size threshold.

If it helps, the application is accessing Azure Block Blob Storage from an Azure Web Application instance. Maybe there is a setting I need to adjust to increase a threshold?

The exception is thrown when the Blob() object is instantiated.

EDIT: So the question as posted was admittedly weak in the way of detail. The problem container has 30 files (mostly large text files that compress well). The total size of the container is 971MB. The request runs for approximately 40 seconds before reporting an HTTP 500 error and the referenced exception.

When I debug locally and step through the same operation it succeeds, resulting in a 237MB zip file. During the operation I can see the memory usage shoot over 2GB by the time the list is created.

I tried to abstract the interaction of the blob storage to its own service, but perhaps I've made this more difficult on myself than is necessary.

Are you running this in 32bit mode? And blowing the 32bit limit? Anyway, if these files are so big, why are you not saving them to some persistent storage, and passing the link around, rather theen keeping the whole files in mem — Michal Ciechan, Oct 13 '17 at 21:03
The problem container has 30 files in it totaling 971MB. For the most part, these files _are_ referenced by a link, but this function is to download all related files as a ZIP file. Most file collections are less than 50MB, this one is just substantially larger than the rest. The application runs in 64-bit mode. Even then though, I'm still under 2GB. — randcd, Oct 14 '17 at 03:31
As Michael said, the problem is most likely that you are running in 32 bit mode on Azure; its the default. As you said, you can see the memory go above 2gb when you run it locally, which is not surprising because you are downloading the blobs to memory and copying them to byte arrays. — flytzen, Oct 14 '17 at 06:15
But... The 64 bit thing is just a work around. In short, you should download each Bob as a stream and immediately write it to the zip writer, then dispose of it. You can also get the zip writer to write to a temp file instead of a memory stream. When it comes to IO and memory pressure, abstraction is not your friend :) — flytzen, Oct 14 '17 at 06:28
Do you still need help with this? Base on your answer I could offer code samples to make this work with a low memory footprint. Besides that there is room for architectural guidance why creating a Zip file this way in a web application usually leads to HTTP requests timeouts and does scale very poorly. — Sascha Gottfried, Oct 18 '17 at 08:42
@SaschaGottfried I definitely understand (especially now) why this isn't a great choice architecturally. I wasn't considering the fact that I actually was creating 3x the objects I was intending to. I am refactoring this approach and if you have any samples or blogs or guidance I would be appreciative. — randcd, Oct 19 '17 at 14:57

score 1 · Accepted Answer · answered Oct 19 '17 at 22:33

Found these two code samples that illustrate the concept very well that supports your use case.

get list of block blobs in blob container and create ZipOutputStream on-the-fly
add each block blob to a ZipOutputStream (SharpZipLib) writing to Response.OutputStream

ZIP compression level:

zipOutputStream.SetLevel(3); //0-9, 9 being the highest level of compression

End-to-end example using ASP.NET WebApi

adding Zip feature can be added in this well structured application

Further reading

hope you benefit from given examples, had not time to compile working sample code myself yet, may be later on if you have problems applying this to your app. — Sascha Gottfried, Oct 20 '17 at 07:29

score 0 · Answer 2 · answered Oct 22 '17 at 22:09

Using Sascha's answer, I was able to make a compromise that seems to perform decently given the parameters. Probably not perfect, but it cuts the memory usage by nearly 70% and allows me to keep some abstraction.

I added a method to my blob service called GetBlobsAsZipAsync that accepts a container name as an argument:

public async Task<Stream> GetBlobsAsZipAsync(string container)
    {
        BlobContinuationToken continuationToken = null;
        BlobResultSegment resultSegment = null;
        byte[] buffer = new byte[4194304];
        MemoryStream ms = new MemoryStream();

        CloudBlobContainer cont = _cbc.GetContainerReference(container);
        resultSegment = await cont.ListBlobsSegmentedAsync(String.Empty, true, BlobListingDetails.Metadata, null, continuationToken, null, null);

        using (var za = new ZipArchive(ms, ZipArchiveMode.Create, true))
        {
            do
            {
                foreach (var bItem in resultSegment.Results)
                {
                    var iBlob = bItem as CloudBlockBlob;

                    var ze = za.CreateEntry(iBlob.Name);
                    using (var fs = await iBlob.OpenReadAsync())
                    {
                        using (var dest = ze.Open())
                        {
                            int count = await fs.ReadAsync(buffer, 0, buffer.Length);
                            while (count > 0)
                            {
                                await dest.WriteAsync(buffer, 0, count);
                                count = await fs.ReadAsync(buffer, 0, buffer.Length);
                            }
                        }                                
                    }
                }

                continuationToken = resultSegment.ContinuationToken;

            } while (continuationToken != null);
        }

        return ms;
    }

This returns the Zip as a (closed) MemoryStream that is then returned as an Array using a FileResult:

[HttpPost]
    public async Task<IActionResult> DownloadFiles(string container, int projectId, int? profileId)
    {
        MemoryStream ms = null;

        _ctx.Add(new ProjectDownload() { ProfileId = profileId, ProjectId = projectId });
        await _ctx.SaveChangesAsync();

        using (ms = (MemoryStream)await _blobs.GetBlobsAsZipAsync(container))
        {
            return File(ms.ToArray(), "application/zip", "download.zip");             
        }
    }

I hope this is useful to someone else who just needs a push in the right direction. I took a lazy way out on this originally and it came back to bite me.

Creating List of object with byte array : OutOfMemoryException

2 Answers2