Opening large files

Question

I have a processes I made that has been working well for several months now. The process recursively zips up all files and folders in a given directory and then uploads the zip file to an FTP server. Its been working, but now, the zip file is exceeding 2gb and its erroring out. Can someone please help me figure out how to get around this 2gb limit? I commented the offending line in the code. Here is the code:

class Program
{
    // Location of upload directory
    private const string SourceFolder = @"C:\MyDirectory";
    // FTP server
    private const string FtpSite = "10.0.0.1";
    // FTP User Name
    private const string FtpUserName = "myUserName";
    // FTP Password
    private const string FtpPassword = "myPassword";

    static void Main(string[] args)
    {
        try
        {
            // Zip everything up using SharpZipLib
            string tmpFile = Path.GetTempFileName();
            var zip = new ZipOutputStream(File.Create(tmpFile));
            zip.SetLevel(8);
            ZipFolder(SourceFolder, SourceFolder, zip);
            zip.Finish();
            zip.Close();

            // Upload the zip file
            UploadFile(tmpFile);
            // Delete the zip file
            File.Delete(tmpFile);
        }
        catch (Exception ex)
        {
            throw ex;
        }
    }

    private static void UploadFile(string fileName)
    {
        string remoteFileName = "/ImagesUpload_" + DateTime.Now.ToString("MMddyyyyHHmmss") + ".zip";
        var request = (FtpWebRequest)WebRequest.Create("ftp://" + FtpSite + remoteFileName);

        request.Credentials = new NetworkCredential(FtpUserName, FtpPassword);
        request.Method = WebRequestMethods.Ftp.UploadFile;
        request.KeepAlive = false;
        request.Timeout = -1;
        request.UsePassive = true;
        request.UseBinary = true;

        // Error occurs in the next line!!!
        byte[] b = File.ReadAllBytes(fileName);
        using (Stream s = request.GetRequestStream())
        {
            s.Write(b, 0, b.Length);
        }

        using (var resp = (FtpWebResponse)request.GetResponse())
        {
        }
    }

    private static void ZipFolder(string rootFolder, string currentFolder, ZipOutputStream zStream)
    {

        string[] subFolders = Directory.GetDirectories(currentFolder);
        foreach (string folder in subFolders)
            ZipFolder(rootFolder, folder, zStream);

        string relativePath = currentFolder.Substring(rootFolder.Length) + "/";

        if (relativePath.Length > 1)
        {
            var dirEntry = new ZipEntry(relativePath) {DateTime = DateTime.Now};
        }
        foreach (string file in Directory.GetFiles(currentFolder))
        {
            AddFileToZip(zStream, relativePath, file);
        }
    }

    private static void AddFileToZip(ZipOutputStream zStream, string relativePath, string file)
    {
        var buffer = new byte[4096];
        var fi = new FileInfo(file);
        string fileRelativePath = (relativePath.Length > 1 ? relativePath : string.Empty) + Path.GetFileName(file);
        var entry = new ZipEntry(fileRelativePath) {DateTime = DateTime.Now, Size = fi.Length};
        zStream.PutNextEntry(entry);
        using (FileStream fs = File.OpenRead(file))
        {
            int sourceBytes;
            do
            {
                sourceBytes = fs.Read(buffer, 0, buffer.Length);
                zStream.Write(buffer, 0, sourceBytes);

            } while (sourceBytes > 0);
        }
    }
}

Unrelated but please be aware that `throw ex;` resets the call stack of the exception. Use `throw;` if you just want to re-throw the exception as is. — Brian Rasmussen, Jan 27 '15 at 22:32
Related to your reputation I think you should know that you should provide full information about exception. — Hamlet Hakobyan, Jan 27 '15 at 22:34
i answered this here http://stackoverflow.com/a/4418362/372529 with an efficient method for streaming blocks at a time — phillip, Jan 27 '15 at 22:38

score 1 · Accepted Answer · answered Jan 27 '15 at 22:35

You are trying to allocate an array possessing more than 2billion elements. .NET limits the maximum size of an array is System.Int32.MaxValue i.e. 2Gb is the upper bound.

You're better off reading the file in pieces an uploading it in pieces; e.g using a loop reading:

int buflen = 128 * 1024;
byte[] b = new byte[buflen];
FileStream source = new FileStream(fileName, FileMode.Open);
Stream dest = request.GetRequestStream();

while (true) {
    int bytesRead = source.Read(buf, 0, buflen);
    if (bytesRead == 0) break;
    dest.Write(buf, 0, bytesRead);
}

I will convert my code into a buffered stream and go from there. Thank you. — Icemanind, Jan 27 '15 at 23:04

Octopoid · Answer 2 · 2015-01-27T22:43:48.680

1

The problem isn't in the zip, but in the File.ReadAllBytes call, which returns an array which has the default size limit of 2GB.

It is possible to disable this limit, as detailed here. I'm assuming you're already compiling this specifically for 64 bit to handle these kind of file sizes. Enabling this option switches .NET over to using 64 bit addresses for arrays instead of the default 32 bit addresses.

It would probably be better to split the archive into parts and upload them separately however. As far as I know the built in ZipFile class doesn't support multi-part archives, but several of the third party libraries do.

Edit: I was thinking about the resulting zip output, rather than the input. To load a huge amount of data INTO the ZipFile, you should use the Buffer based approach suggested by Petesh and philip.

edited Jan 27 '15 at 22:43

answered Jan 27 '15 at 22:35

Octopoid

3,507
5
22
43

ooh, I wasn't aware that you could override the limit on the 64bit VM. good to know it's possible – Anya Shenanigans Jan 27 '15 at 22:37
you don't want to disable the limit. this code is not efficient at all. he needs to read the file into a buffer - send the buffer - repeat till done. then tackle any server file size limit after that is working. – phillip Jan 27 '15 at 22:38
Yes indeed, it's not heavily publicized for good reason! Just to be clear, while it is technically possible, it should be considered a last resort. Petesh and philip are quite right, in this case you can and should just load the file incrementally with a buffer and 'avoid the limit' that way. – Octopoid Jan 27 '15 at 22:41
you can still run into a limit based on IIS configuration but you should be fine for long time so google that when you cross that bridge – phillip Jan 27 '15 at 22:43
Yeah, when that time comes around, I'd suggest looking into multi-part archives. – Octopoid Jan 27 '15 at 22:44
Thank you for the information. Although, disabling the 2gig limit might be a super quick fix for now, I want to do it right and convert it into a buffered stream. – Icemanind Jan 27 '15 at 23:06

Opening large files

2 Answers2