13

I have a REST frontend written using Python/Bottle which handles file uploads, usually large ones. The API is wirtten in such a way that:

The client sends PUT with the file as a payload. Among other things, it sends Date and Authorization headers. This is a security measure against replay attacks -- the request is singed with a temporary key, using target url, the date and several other things

Now the problem. The server accepts the request if the supplied date is in given datetime window of 15 minutes. If the upload takes long enough time, it will be longer than the allowed time delta. Now, the request authorization handling is done using decorator on bottle view method. However, bottle won't start the dispatch process unless the upload is finished, so the validation fails on longer uploads.

My question is: is there a way to explain to bottle or WSGI to handle the request immediately and stream the upload as it goes? This would be useful for me for other reasons as well. Or any other solutions? As I am writing this, WSGI middleware comes to mind, but still, I'd like external insight.

I would be willing to switch to Flask, or even other Python frameworks, as the REST frontend is quite lightweight.

Thank you

Tomáš Plešek
  • 1,482
  • 2
  • 12
  • 21

2 Answers2

20

I recommend splitting the incoming file into smaller-sized chunks on the frontend. I'm doing this to implement a pause/resume function for large file uploads in a Flask application.

Using Sebastian Tschan's jquery plugin, you can implement chunking by specifying a maxChunkSize when initializing the plugin, as in:

$('#file-select').fileupload({
    url: '/uploads/',
    sequentialUploads: true,
    done: function (e, data) {
        console.log("uploaded: " + data.files[0].name)
    },
    maxChunkSize: 1000000 // 1 MB
});

Now the client will send multiple requests when uploading large files. And your server-side code can use the Content-Range header to patch the original large file back together. For a Flask application, the view might look something like:

# Upload files
@app.route('/uploads/', methods=['POST'])
def results():

    files = request.files

    # assuming only one file is passed in the request
    key = files.keys()[0]
    value = files[key] # this is a Werkzeug FileStorage object
    filename = value.filename

    if 'Content-Range' in request.headers:
        # extract starting byte from Content-Range header string
        range_str = request.headers['Content-Range']
        start_bytes = int(range_str.split(' ')[1].split('-')[0])

        # append chunk to the file on disk, or create new
        with open(filename, 'a') as f:
            f.seek(start_bytes)
            f.write(value.stream.read())

    else:
        # this is not a chunked request, so just save the whole file
        value.save(filename)

    # send response with appropriate mime type header
    return jsonify({"name": value.filename,
                    "size": os.path.getsize(filename),
                    "url": 'uploads/' + value.filename,
                    "thumbnail_url": None,
                    "delete_url": None,
                    "delete_type": None,})

For your particular application, you will just have to make sure that the correct auth headers are still sent with each request.

Hope this helps! I was struggling with this problem for a while ;)

petrus-jvrensburg
  • 1,353
  • 15
  • 19
  • I will add that on some operating systems (in my case Ubuntu 14.10), if you do open(filename, 'a'), then seek() will not move your pointer. Appending will be enforced and you will always attach incoming chunk to the end of file. – Drachenfels Mar 24 '15 at 17:37
  • @petrus-jvrensburg Your answer is great for my need, but I'm wondering, how can Flask not mix the request in the case that two users upload the same filename at the same time ? Do you have to implement a session mechanism to identify the two users or is there some underlying http/nginx/uwsgi/flask property that correctly map the request to the same call method? Thank you for your help! – Cyril N. Jun 19 '15 at 19:15
  • @CyrilN. Haven't thought about that. But if you've already got some authentication set up for your app, then use that. Otherwise you could interrogate 'request.remote_addr' and 'request.user_agent' to distinguish between simultaneous users. – petrus-jvrensburg Jun 20 '15 at 20:33
  • @petrus-jvrensburg, i im unable to send my files names, can you show your `front - end` code –  Jun 04 '18 at 11:31
2

When using plupload solution might be like this one:

$("#uploader").plupload({
    // General settings
    runtimes : 'html5,flash,silverlight,html4',
    url : "/uploads/",

    // Maximum file size
    max_file_size : '20mb',

    chunk_size: '128kb',

    // Specify what files to browse for
    filters : [
        {title : "Image files", extensions : "jpg,gif,png"},
    ],

    // Enable ability to drag'n'drop files onto the widget (currently only HTML5 supports that)
    dragdrop: true,

    // Views to activate
    views: {
        list: true,
        thumbs: true, // Show thumbs
        active: 'thumbs'
    },

    // Flash settings
    flash_swf_url : '/static/js/plupload-2.1.2/js/plupload/js/Moxie.swf',

    // Silverlight settings
    silverlight_xap_url : '/static/js/plupload-2.1.2/js/plupload/js/Moxie.xap'
});

And your flask-python code in such case would be similar to this:

from werkzeug import secure_filename

# Upload files
@app.route('/uploads/', methods=['POST'])
def results():
    content = request.files['file'].read()
    filename = secure_filename(request.values['name'])

    with open(filename, 'ab+') as fp:
        fp.write(content)

    # send response with appropriate mime type header
    return jsonify({
        "name": filename,
        "size": os.path.getsize(filename),
        "url": 'uploads/' + filename,})

Plupload always sends chunks in exactly same order, from first to last, so you do not have to bother with seek or anything like that.

Drachenfels
  • 3,037
  • 2
  • 32
  • 47