0

I'm trying to diagnose why my Python server app is leaking memory. The app takes a request for an image url resizes it using Vips and returns the image. After every request the memory usage grows roughly by the size of the original image.

from fapws import base
import fapws._evwsgi as evwsgi
from gi.repository import Vips
import urllib2
import hmac
import hashlib
import base64
import StringIO
from boto.s3.connection import S3Connection
from boto.s3.bucket import Bucket

def start():
    evwsgi.start('0.0.0.0', '80')
    evwsgi.set_base_module(base)

    def lfrThumbnail(environ, start_response):
        try:
            parameters = environ['PATH_INFO'].split('/')
            s3File = 'my s3 url' + parameters[0]
            width = float(parameters[1])
            height = float(parameters[2])
            hmacSignatureUser = parameters[3]

            hmacSignature = some hasing code...

            if not (hmacSignatureUser == hmacSignature):
                print hmacSignatureUser
                print hmacSignature
                print hmacSignatureUser == hmacSignature
                raise Exception

            bufferedImage = urllib2.urlopen(s3File).read()
            image = Vips.Image.new_from_buffer(bufferedImage, '')

            imageWidth = float(image.width)
            imageHeight = float(image.height)
            imageAspectRatio =  imageWidth / imageHeight
            if (width > imageWidth) or (height > imageHeight):
                image = image
            elif abs((imageAspectRatio / (width/height)) - 1) < 0.05:
                image = image.resize(width / imageWidth)
            else:
                scaleRatioWidth = width / imageWidth
                scaleRatioHeight = height / imageHeight
                maxScale = max(scaleRatioWidth, scaleRatioHeight)
                image = image.resize(maxScale)
                cropStartX = (image.width - width) / 2
                cropStartY = (image.height - height) / 2
                image = image.crop(cropStartX, cropStartY, width, height)

        except Exception, e:
            start_response('500 INTERNAL SERVER ERROR', [('Content-Type','text')])
            return ['Error generating thumbnail']

        start_response('200 OK', [
            ('Content-Type','image/jpeg'),
            ('Cache-Control: max-stale', '31536000')
        ])
        return [image.write_to_buffer('.jpg[Q=90]')]

    evwsgi.wsgi_cb(('/lfr/', lfrThumbnail))

    evwsgi.set_debug(0)
    evwsgi.run()

if __name__ == '__main__':
    start()

I've tried using muppy , the pympler tracker but each diff after the image open/close operations showed only a couple of bytes being used.

Could the external C libraries be the cause of the memory leak? if so, how does one debug that.

If it's anything related I'm running the python server inside a docker container

jcupitt
  • 10,213
  • 2
  • 23
  • 39
Nicholas
  • 139
  • 1
  • 11

1 Answers1

3

I'm the libvips maintainer. It sounds like the vips operation cache: vips keeps the last few operations in memory and reuses the results if it can. This can be a huge performance win in some cases.

For a web service, you're probably caching elsewhere so you won't want this, or you won't want a large cache at least. You can control the cache size with vips_cache_set_max() and friends:

http://www.vips.ecs.soton.ac.uk/supported/current/doc/html/libvips/VipsOperation.html#vips-cache-set-max

From Python it's:

Vips.cache_set_max(0)

To turn off the cache completely. You can set the cache to limit by memory use, file descriptor use, or number of operations.

There are a couple of other useful things you can set to watch resource usage. Vips.leak_set(True) makes vips report leaked objects on exit, and also report peak pixel buffer memory use. Vips.cache_set_trace(True) makes it trace all operations as they are called, and shows cache hits.

In your code, I would also enable sequential mode. Add access = Vips.Access.SEQUENTIAL to your new_from_buffer().

The default behaviour is to open images for full random access (since vips doesn't know what operations you'll end up running on the image). For things like JPG, this means that vips will decode the image to a large uncompressed array on open. If the image is under 100mb, it'll keep this array in memory.

However for a simple resize, you only need to access pixels top-to-bottom, so you can hint sequential access on open. In this mode, vips will only decompress a few scanlines at once from your input and won't ever keep the whole uncompressed image around. You should see a nice drop in memory use and latency.

There are a lot of other things you could handle, like exif autorotate, colour management, transparency, jpeg shrink-on-load, and many others, I'm sure you know. The sources to vipsthumbnail might be a useful reference:

https://github.com/jcupitt/libvips/blob/master/tools/vipsthumbnail.c

jcupitt
  • 10,213
  • 2
  • 23
  • 39
  • using Vips.leak_set(True), vips only reports peak memory usage when the server is stopped, this is also when the Python script exits. Should I be closing vips after processing a request? – Nicholas Nov 04 '15 at 09:15
  • 1
    I'd leave vips running, you'll save the startup/shutdown time. The peak memory reporting is handy for testing, not so useful in production. The C API has a thing to get current peak memory, but it's not exposed to Python, I'll add this. libvips has a fairly large test suite, all written in Python, and one of the tests is "does the entire test suite complete with no memory leaks?", so hopefully you shouldn't see leaks in your server, and it's more of a double-check. – jcupitt Nov 04 '15 at 10:05