4

I have a rails webapp that allows users to upload videos, where they are stored in an NFS-mounted directory.

The current setup is fine for smaller files, but I need to support large file uploads as well (up to 4gb). When I try to upload a 4gb file, it eventually happens but is awful from a UX standpoint: upload starts and progress is displayed based on XHR 'progress' events, but then after 100%, there is still a long wait (5+ minutes) before the server responds to the request.

Initially I thought this had to do with copying the file from some temp directory over to the final NFS-mounted directory. But now I'm not so sure. After adding logging to my routes, I see that there is about a 3-minute wait between when the file upload progress reaches 100% and when the code in my controller action runs (before I do any handling for moving the file to the NAS).

I'm wondering the following:

  • What is happening during this 3 minute wait after the upload completes and before my action is called?
  • Is there a way for me to account for whatever is going on during this period so that the client gets a response immediately after the upload completes so that they don't time out?
  • How are large file uploads typically handled in Rails? This seems like it would be a common problem, but I can't seem to find anything on it.

(Note: I was originally using CarrierWave for uploads when I discovered this problem. I removed it and simply handled the file save using FileUtils directly in my model just to make sure the wait times weren't the result of some CarrierWave magic happening behind the scenes, but got exactly the same result.)

ruby -v: 1.9.3p362

rails -v: 3.2.11

Danny
  • 3,615
  • 6
  • 43
  • 58
  • Can you include your log file? Also, what server are you on? – Dan Grahn Aug 20 '13 at 18:26
  • I could attache logs, but I'm not sure it would really help. Absolutely nothing happens in the logs during the period of time in question (after upload completes, before action runs). – Danny Aug 20 '13 at 18:46
  • If it isn't even getting to your controller action, and you don't have any crazy before_filters or around_filters, then it's gotta be either your web server or your middleware. Anything weird in `rake middleware`? – Taavo Aug 23 '13 at 20:31
  • Also, when I've run into similar problems before, it's usually because somebody was copying the uploaded file instead of moving it. – Taavo Aug 23 '13 at 20:57

2 Answers2

3

You might consider using MiniProfiler to get a better sense of where the time is being spent.

Large file uploading needs to be handled in the background. Any controllers or database access should simply mark that the file was uploaded, and then queue a background processing job to move it around, and any other operations that may need to happen.

http://mattgrande.com/2009/08/11/delayedjob/

That article has the gist of it, every implementation is going to be different.

Nick Veys
  • 23,458
  • 4
  • 47
  • 64
  • I'll look into MiniProfiler. I did actually try Delayed Job with CarrierWave, but unfortunately there was still that window of time before any of my code ran, and before I could even queue the file copy. – Danny Aug 20 '13 at 18:42
  • The idea here is that the upload shouldn't even hit your application—you configure apache/nginx to just accept the file and put it somewhere for you. You use javascript to submit the path or URL of the uploaded file to your application, which then queues the job to actually process the upload. The [s3 direct uploader](https://github.com/waynehoover/s3_direct_upload) gem uses a similar technique on s3, without the backgrounding. – Taavo Aug 23 '13 at 20:54
  • Woah, someone other than me linked to my blog. Neat! – Matt Grande Oct 17 '13 at 13:53
3

I finally found the answer to my main question: What is happening during this 3 minute wait after the upload completes and before my action is called?

It's all explained very clearly in this post: The Rails Way - Uploading Files

"When a browser uploads a file, it encodes the contents in a format called ‘multipart mime’ (it’s the same format that gets used when you send an email attachment). In order for your application to do something with that file, rails has to undo this encoding. To do this requires reading the huge request body, and matching each line against a few regular expressions. This can be incredibly slow and use a huge amount of CPU and memory."

I tried the modporter Apache module mentioned in the post. The only problem is that the module and its corresponding plugin were written 4 years ago, and with their website no longer in operation, there's almost no documentation on either one.

With modporter, I wanted to specify my NFS-mounted directory as the PorterDir, in the hopes that it would pass the file right along to the NAS without any extra copying from a temp directory. However, I was not able to get this far since the module seemed to be ignoring my specified PorterDir, and was returning a completely different path to my actions. On top of that, the path it was returning didn't even exist, so I had no idea what was actually happening to my uploads.

My Workaround

I had to get the problem solved quickly, so I went with a somewhat hacky solution for now which consisted of writing corresponding JavaScript/Ruby code in order to handle chunked file uploads.

JS Example:

var MAX_CHUNK_SIZE = 20000000; // in bytes

window.FileUploader = function (opts) {
    var file = opts.file;
    var url = opts.url;
    var current_byte = 0;
    var success_callback = opts.success;
    var progress_callback = opts.progress;
    var percent_complete = 0;

    this.start = this.resume = function () {
        paused = false;
        upload();
    };

    this.pause = function () {
        paused = true;
    };

    function upload() {
        var chunk = file.slice(current_byte, current_byte + MAX_CHUNK_SIZE);
        var fd = new FormData();
        fd.append('chunk', chunk);
        fd.append('filename', file.name);
        fd.append('total_size', file.size);
        fd.append('start_byte', current_byte);

        $.ajax(url, {
          type: 'post',
          data: fd,
          success: function (data) {
              current_byte = data.next_byte;
              upload_id = data.upload_id;

              if (data.path) {
                  success_callback(data.path);
              }
              else {
                  percent_complete= Math.round(current_byte / file.size * 100);
                  if (percent_complete> 100) percent_complete = 100;
                  progress_callback(percent_complete); // update some UI element to provide feedback to user
                  upload();
              }
          }
        });
    }
};

(forgive any syntax errors, just typing this off the top of my head)

Server-side, I created a new route to accept the file chunks. On first chunk submission, I generate an upload_id based on filename/size, and determine if I already have a partial file from an interrupted upload. If so, I pass back the next starting byte I need along with the id. If not, I store the first chunk and pass back the id.

The process with additional chunk uploads appending the partial file until the file size matches the original file size. At this point, the server responds with the temporary path to the file.

The javascript then removes the file input from the form, and replaces it with a hidden input whose value is the file path returned from the server, and then posts the form.

Then finally server-side, I handle moving/renaming the file and saving its final path to my model.

Phew.

Danny
  • 3,615
  • 6
  • 43
  • 58
  • 2
    As a heads up to future readers, FileUploader is only supported by Chrome and will not be added to the html5 spec http://www.html5rocks.com/en/tutorials/file/filesystem/ – Rick Smith Feb 19 '15 at 20:42