3

I'm hosting a file access type website using Cherrypy, through uwsgi and nginx on a Raspberry Pi. One thing I've noticed is that if the file is rather large (let's say, about a gigabyte), uwsgi says it was killed by signal 9. This was remedied by putting a cherrypy.config.update({'tools.sessions.timeout': 1000000}) but this doesn't really solve the problem, as much as it is a bad hacky workaround that doesn't really work. It mainly just causes another problem by making the timeout very large. In addition, the browser cannot estimate how long it will take very accurately, and will end up hanging for a while (Read: 5 or so mins on a hardwired connection), and then rapidly starts downloading.

It starts as

The file download

Then goes to

The continued file download

My download code is very simple, just consisting of this single line.

return cherrypy.lib.static.serve_file(path,"application/x-download",os.path.basename(path))

My previous download code didn't quite work out well.

f = file(path) cherrypy.response.headers['Content-Type'] = getType(path)[0] return f Is there a way to remedy this?

ollien
  • 4,418
  • 9
  • 35
  • 58
  • It is a known issue with cherrypy and was supposed to be looked into circa version 3. Not sure how that went! Simplest solution will be to use different framework, like bottle or flask. – Tymoteusz Paul Oct 07 '14 at 03:04
  • 1
    Are there any solutions that don't involve changing my framework? It would be a shame to have to rewrite my whole app in another framework because of this... – ollien Oct 07 '14 at 03:19
  • Not from the top of my head, sorry. – Tymoteusz Paul Oct 07 '14 at 03:23
  • A solution could be delegating the file serve to uWSGI offload engine: http://uwsgi-docs.readthedocs.org/en/latest/Snippets.html (check the first snippet for X-Sendfile emulation) – roberto Oct 07 '14 at 03:43
  • router_static isn't found. – ollien Oct 09 '14 at 04:36

1 Answers1

6

General consideration

First, of all I have to say it's such a piled up configuration, CherryPy -> uWSGI -> Nginx, for such a constrained environment. According to the author, it's safe to use CherryPy on its own for small-size applications, when there's no special requirement. Adding Nginx in front adds a lot of flexibility, so it's usually beneficial, but as long as CherryPy's default deployment is standard HTTP, I strongly suggest to stay with the two (and forget about WSGI altogether).

Second, you probably already know that your problem is likely session-related, considering the workaround you've tried. Here's the quote from documentation about streaming response body which file download is.

In general, it is safer and easier to not stream output. Therefore, streaming output is off by default. Streaming output and also using sessions requires a good understanding of how session locks work.

What it suggests is manual session lock management. Knowing how your application works should lead you to appropriate lock design.

And third. There's usually a way to shift the duty of handling a file download to a web-server, basically by sending appropriate header with filename from the proxied application. In case on nginx it's called X-accel. So you can avoid the hassle of lock management, still having session restricted downloads.

Experiment

I've made a simple CherrPy app with two download options and putted it behind Nginx. I played with 1.3GiB video file on local Linux machine in Firefox and Chromium. There were three ways:

  1. Un-proxied download from CherryPy (http://127.0.0.1:8080/native/video.mp4),
  2. Proxied download from CherryPy via Nginx (http://test/native/video.mp4),
  3. X-accel download from CherryPy via Nginx (http://test/nginx/video.mp4).

With (1) and (2) I had minor strange behaviour in both Firefox and Chromium. (1) on Firefox with uptime of several days I constantly had ~5MiB/s download speed and one full-loaded CPU core. On fresh Firefox there was no such behaviour. (2) on Chromium resulted in a couple of unfinished interrupted downloads (all times around 1GiB). But in general both browsers showed around HDD physical performance of 50-70MiB/s.

With (3) I had no issue in both, same 50-70MiB/s throughput, so somehow in my small experiment it ended up as the most stable way.

Setup

app.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-


import os

import cherrypy


DownloadPath = '/home/user/Videos'

config = {
  'global' : {
    'server.socket_host' : '127.0.0.1',
    'server.socket_port' : 8080,
    'server.thread_pool' : 8
  }
}


class App:

  @cherrypy.expose
  def index(self):
    return 'Download test'

  @cherrypy.expose
  def native(self, name):
    basename = os.path.basename(name)
    filename = os.path.join(DownloadPath, basename)
    mime     = 'application/octet-stream'
    return cherrypy.lib.static.serve_file(filename, mime, basename)

  @cherrypy.expose
  def nginx(self, name):
    basename = os.path.basename(name)
    cherrypy.response.headers.update({
      'X-Accel-Redirect'    : '/download/{0}'.format(basename),
      'Content-Disposition' : 'attachment; filename={0}'.format(basename),
      'Content-Type'        : 'application/octet-stream'
    })


if __name__ == '__main__':
  cherrypy.quickstart(App(), '/', config)

app.conf

server {
  listen  80;

  server_name test;

  root /var/www/test/public;

  location /resource {
    # static files like images, css, js, etc.
    access_log off;
  }

  location / {
    proxy_pass         http://127.0.0.1:8080;
    proxy_set_header   Host             $host;
    proxy_set_header   X-Real-IP        $remote_addr;
    proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
  }

  location /download {
    internal;
    alias /home/user/Videos;
  }

}
Community
  • 1
  • 1
saaj
  • 23,253
  • 3
  • 104
  • 105
  • I'd rather not drop uwsgi. I like having it as a middleman due to the fact that it can have multiple workers. – ollien Oct 09 '14 at 04:36
  • @ollien *CherryPy* is threaded server, so it does have multiple workers. If you mean multiple process workers, than it makes sense only if you have CPU-bould workload. Anyway, if you want to keep the complicated configuration and have more points of failure, my answer still applies. You can manually manage session locks, or try ``X-accel``. With the latter, I don't think the header makes sense to *uWSGI* so it just will bypass it to *Nginx*. – saaj Oct 09 '14 at 08:34