1

I am writing a webservice in Django to handle image/video streams, but it's mostly done in an external program. For instance:

  1. client requests for /1.jpg?size=300x200
  2. python code parse 300x200 in django (or other WSGI app)
  3. python calls convert (part of Imagemagick) using subprocess module, with parameter 300x200
  4. convert reads 1.jpg from local disk, convert to size accordingly
  5. Writing to a temp file
  6. Django builds HttpResponse() and read the whole temp file content as body

As you can see, the whole temp file read-then-write process is inefficient. I need a generic way to handle similar external programs like this, not only convert, but others as well like cjpeg, ffmepg, etc. or even proprietary binaries.

I want to implement it in this way:

  1. python gets the stdout fd of the convert child process
  2. chain it to WSGI socket fd for output

I've done my homework, Google says this kind of zero-copy could be done with system call splice(). but it's not available in Python. So how to maximize performance in Python for these kind of scenario?

  1. Call splice() using ctypes?
  2. hack memoryview() or buffer() ?
  3. subprocess has stdout which has readinto(), could this be utilized somehow?
  4. How could we get fd number for any WSGI app?

I am kinda newbie to these, any suggestion is appreciated, thanks!

est
  • 11,429
  • 14
  • 70
  • 118
  • It'll be quite expensive to spawn a subprocess for each request. How about using [`PIL`](http://www.pythonware.com/products/pil/) to do in in the same process, and also avoid creating a temporary file? – Aya May 02 '13 at 10:11
  • 1. PIL's result is not as good. 2. I didn't mean Imagemagick exclusively, but some other external program as well, ffmpeg, cjpeg, etc. – est May 02 '13 at 12:46
  • 1. What scaling algorithm are you using in ImageMagick? 2. Many commonly used external programs have Python bindings which would be much faster than spawning subprocesses, e.g. [pyffmpeg](http://code.google.com/p/pyffmpeg/). – Aya May 02 '13 at 12:51
  • I am developing a system which depends on a binary only external program, just taking `convert` as an easy example. Btw spawning subprocess might be slow, but you can always hold a pool of process open. – est May 02 '13 at 13:12
  • Point is, if your goal is to maximize efficiency, then the best solution will depend on the specific binary. e.g. for `convert` the saving gained by using `splice()` on an existing fd will be in the microsecond range, but the saving gained using a Python binding will more likely be in the millisecond range. And how exactly would you hold open a pool of `convert` processes? – Aya May 02 '13 at 13:21
  • The current solution is temp file, which is a waste of disk storage, and Django has to read the whole file to serve it. If you have anything better please tell me. To hold a process, `subprocess.Popen()`, only write data to its stdin when request comes through. – est May 02 '13 at 13:29
  • Well, using `Popen()` as fallback for cases where no binding exists will be better than writing a temporary file, but I assumed you'd already figured that out from the content of the question. My point is that `splice()` is not likely to yield a significant performance boost over transferring the data from the subprocess via a buffer. As far as I'm aware the `convert` binary will only allow you to convert one image per instance, so it will necessitate a separate call to `Popen()` for each image you want to convert, resulting in a new process being created each time. – Aya May 02 '13 at 13:36
  • you don't understand, the output buffer of `convert` and the output buffer of http response socket could be the exact same one. So any data from `convert` yields is consumed by socket immediately. It's basically a kernel level pipe without any ring3 read-write process. That's the mechanism behind HAProxy and many other high performance soft routers. Thanks for the effort but I need to address the temp file read-write problem, not changing the way external program works. A prefork process pool is easy to implement in python, but the `splice` is not – est May 02 '13 at 13:52

2 Answers2

1

If the goal is to increase performance, you ought to examine the bottlenecks on a case-by-case basis, rather than taking a "one solution fits all" approach.

For the convert case, assuming the images aren't insanely large, the bottleneck there will most likely be spawning a subprocess for each request.

I'd suggest avoiding creating a subprocess and a temporary file, and do the whole thing in the Django process using PIL with something like this...

import os
from PIL import Image
from django.http import HttpResponse

IMAGE_ROOT = '/path/to/images'

# A Django view which returns a resized image
# Example parameters: image_filename='1.jpg', width=300, height=200
def resized_image_view(request, image_filename, width, height):
    full_path = os.path.join(IMAGE_ROOT, image_filename)
    source_image = Image.open(full_path)
    resized_image = source_image.resize((width, height))
    response = HttpResponse(content_type='image/jpeg')
    resized_image.save(response, 'JPEG')
    return response

You should be able to get results identical to ImageMagick by using the correct scaling algorithm, which, in general is ANTIALIAS for cases where the rescaled image is less than 50% of the size of the original, and BICUBIC in all other cases.

For the case of videos, if you're returning a transcoded video stream, the bottleneck will likely be either CPU-time, or network bandwidth.

Aya
  • 39,884
  • 6
  • 55
  • 55
  • Hi @Aya lots of thanks, but how do you guys deal with the temp file read-write problem? Could it be avoided at all? – est May 02 '13 at 13:59
  • 1
    Well, it depends on how much data you're expecting, and the interface to the program. For programs which are capable of dumping their output to stdout, an anonymous pipe should suffice. You can wrap the pipe's output with an iterator which calls `read()` then use a [`StreamingHttpResponse`](https://docs.djangoproject.com/en/dev/ref/request-response/#django.http.StreamingHttpResponse) to allow the network socket to consume the data at whatever rate it needs to, which will depend on the available bandwidth. – Aya May 02 '13 at 14:07
1

I find that WSGI could actually handle an fd as an interator response

Example WSGI app:

def image_app(environ, start_response):
    start_response('200 OK', [('Content-Type', 'image/jpeg'), ('Connection', 'Close')])
    proc = subprocess.Popen([
        'convert',
        '1.jpg',
        '-thumbnail', '200x150',
        '-', //to stdout
    ], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    return proc.stdout

It wrapps the stdout as http response via a pipe

est
  • 11,429
  • 14
  • 70
  • 118