0

I read 100 jpeg images in a loop and extract different areas from them.

Loop contents:

   VImage in = VImage::new_from_file(impath.c_str(),
          VImage::option()->
          set( "access", VIPS_ACCESS_SEQUENTIAL ) );

   VImage out = in.extract_area(x0, y0, x1 - x0, y1 - y0);
   cout << out.avg() << endl;

Or the same thing in python:

img_full = pyvips.Image.new_from_file(impath, access='sequential') 
img = img_full.extract_area(x0, y0, x1 - x0, y1 - y0)
print(img.avg()) 

I watch RSS, physical memory usage. It starts from around 40MB and then with each image it grows.

Here is a graph: enter image description here

Here is the graph for images 4X the original size with the same crop's origin, width/height. enter image description here

Why does it happen ? Is there a leak going somewhere ? When I set a flag to trace leaks, pyvips.base.leak_set(1), I get about 60 MB reported. Also I used cgroups to limit a physical memory for a process to 100 MB. Vips is able to run, without crashing, but is slower. For comparison, the similar operations in OpenCV consume almost a constant amount of RSS, 140 MB or 300 MB depending on the image size. For my experiments, vips wins several times in CPU time, but looses several times in memory.

pyvips version: 2.0.4

libvips version: 8.6.1

sbond
  • 168
  • 1
  • 8

1 Answers1

2

I tried this test program:

import sys
import pyvips
import random

for filename in sys.argv[1:]:
    image = pyvips.Image.new_from_file(filename, access='sequential')
    x = random.randint(0, image.width - 2)
    y = random.randint(0, image.height - 2)
    w = random.randint(1, image.width - x)
    h = random.randint(1, image.height - y)
    print 'filename =', filename, 'avg =', image.crop(x, y, w, h).avg()

I ran it like this:

$ mkdir samples
$ for i in {1..2000}; do cp ~/pics/k2.jpg samples/$i.jpg; done
$ python soak.py samples/*

k2.jpg is a 2k x 1.5k RGB jpg image. While it ran, I watched RES in top. It rose a little at the start, but after 100 or so images stabilized at around 75MB and stayed there for the remaining 1900 iterations. This is with py27, pyvips 2.0.4, libvips 8.6.1.

libvips keeps a cache of recent operations. Usually this is harmless (and helpful), but it can trigger unwanted memory use in some cases.

You could try adding:

pyvips.cache_set_max(0)

Near the start of your program. For this example, it dropped peak memuse down from 75mb to 38mb.

jcupitt
  • 10,213
  • 2
  • 23
  • 39
  • I used your code, and generated 2k x 1.5k images. (width goes first). Indeed, memory stabilizes around 99 MB. And if cache_set_max(0) is used, then it goes 35-40MB. However, in my original problem the images were of size 12000x2048, and 4X images were 24000x4096, sorry I forgot to mention that. To make sure it is independent of my particular image pixels, I generated 300 random ones with 12000x2048 size. Also, to simulate my real case, I set a random crop's width/height between 800-900, Then run your script again. Without cache setting, memory slowly rises to 500MB. – sbond Jan 31 '18 at 19:32
  • To continue, with cache_set_max(0) it stays under 200MB. And using the option in the real case 1 (size 12000x2048) limited the memory usage below 400 MB (compared to 600MB before, see graph). For case 2 (size 24000x4096), it went under 700MB (compared to 1.2GB before). I still feel vips doesn't clean up something, because in both cases the very first image requires only 160 MB. After all, the crop sizes and positions remain the same, despite the full image size changes 4X, so I indeed expect it to use the similar amount of physical memory. – sbond Jan 31 '18 at 19:52
  • 1
    Hmm strange. I tried 12000x2048 images here and RES in top bounced between 50 and 90MB. Could it be memory fragmentation from your platform's malloc implementation perhaps? I'm using Ubuntu 17.10. It could also be the number of cores you have -- I'm using a two core / four thread laptop -- since libvips memuse increases with the number of threads. You could try `export VIPS_CONCURRENCY=1` before running. – jcupitt Feb 01 '18 at 10:22
  • Also, libvips 8.6.2 has just been released, it would be worth trying with that, it has some changes to the way stats operations and sequential images work together. https://github.com/jcupitt/libvips/releases/tag/v8.6.2 – jcupitt Feb 01 '18 at 10:23
  • 1
    Thank you! export VIPS_CONCURRENCY=1 along with pyvips.cache_set_max(0) made the memory under control, interestingly without any sacrifice in speed. I'll also try the new release. Appreciate your help! – sbond Feb 02 '18 at 00:01
  • Glad it's working! There's not a lot of parallelism in this example, most time will be spent in libjpeg decode, which is single-threaded. If you were doing something fancier you'd probably see some benefit from a larger threadpool. – jcupitt Feb 02 '18 at 09:16