0

I'm writing a Python(3.4.3) program that uses VIPS(8.1.1) on Ubuntu 14.04 LTS to read many small tiles using multiple threads and put them together into a large image.

In a very simple test :

from concurrent.futures import ThreadPoolExecutor
from multiprocessing import Lock
from gi.repository import Vips

canvas = Vips.Image.black(8000,1000,bands=3)

def do_work(x):
    img = Vips.Image.new_from_file('part.tif')    # RGB tiff image
    with lock:
        canvas = canvas.insert(img, x*1000, 0)

with ThreadPoolExecutor(max_workers=8) as executor:
    for x in range(8):
        executor.submit(do_work, x)

canvas.write_to_file('complete.tif')

I get correct result. In my full program, the work for each thread involves read binary from a source file, turn them into tiff format, read the image data and insert into canvas. It seems to work but when I try to examine the result, I ran into trouble. Because the image is extremely large(~50000*100000 pixels), I couldn't save the entire image in one file, so I tried

canvas = canvas.resize(.5)
canvas.write_to_file('test.jpg')

This takes extremely long time, and the resulting jpeg has only black pixels. If I do resize three times, the program get killed. I also tried

canvas.extract_area(20000,40000,2000,2000).write_to_file('test.tif')

This results in error message segmentation fault(core dumped) but it does save an image. There are image contents in it, but they seem to be in the wrong place.

I'm wondering what the problem could be?

Below are the codes for the complete program. The same logic was also implemented using OpenCV + sharedmem (sharedmem handled the multiprocessing part) and it worked without a problem.

import os
import subprocess
import pickle
from multiprocessing import Lock
from concurrent.futures import ThreadPoolExecutor
import threading
import numpy as np
from gi.repository import Vips

lock = Lock()

def read_image(x):
    with open(file_name, 'rb') as fin:
        fin.seek(sublist[x]['dataStartPos'])
        temp_array = np.fromfile(fin, dtype='int8', count=sublist[x]['dataSize'])

    name_base = os.path.join(rd_path, threading.current_thread().name + 'tempimg')
    with open(name_base + '.jxr', 'wb') as fout:
        temp_array.tofile(fout)
    subprocess.call(['./JxrDecApp', '-i', name_base + '.jxr', '-o', name_base + '.tif'])
    temp_img = Vips.Image.new_from_file(name_base + '.tif')
    with lock:
        global canvas
        canvas = canvas.insert(temp_img, sublist[x]['XStart'], sublist[x]['YStart'])

def assemble_all(filename, ramdisk_path, scene):
    global canvas, sublist, file_name, rd_path, tilesize_x, tilesize_y
    file_name = filename
    rd_path = ramdisk_path
    file_info = fetch_pickle(filename)   # A custom function
    # this info includes where to begin reading image data, image size and coordinates
    tilesize_x = file_info['sBlockList_P0'][0]['XSize']
    tilesize_y = file_info['sBlockList_P0'][0]['YSize']
    sublist = [item for item in file_info['sBlockList_P0'] if item['SStart'] == scene]
    max_x = max([item['XStart'] for item in file_info['sBlockList_P0']])
    max_y = max([item['YStart'] for item in file_info['sBlockList_P0']])
    canvas = Vips.Image.black((max_x+tilesize_x), (max_y+tilesize_y), bands=3)

    with ThreadPoolExecutor(max_workers=4) as executor:
        for x in range(len(sublist)):
            executor.submit(read_image, x)

    return canvas

The above module (imported as mcv) is called in the driver script :

canvas = mcv.assemble_all(filename, ramdisk_path, 0)

To examine the content, I used

canvas.extract_area(25000, 40000, 2000, 2000).write_to_file('test_vips1.jpg')
user3667217
  • 2,172
  • 2
  • 17
  • 29
  • 1
    I thought of another possible problem: vips uses a recursive algorithm to walk pipelines and work out which bit to calculate next. If your pipeline is extremely long and the C stack on your platform is rather small, you can get a C stack overflow. Could that be the problem? I would edit your question to include a complete example program that fails, plus a description of your platform and the vips version. You could also open an issue on the vips tracker, it's a better place for debugging: https://github.com/jcupitt/libvips/issues – jcupitt Nov 12 '15 at 13:32
  • Thanks for your comment ! I'll have to read on C stack overflow. I'm not quite sure what it means... I've included complete program here. The file to be read from contains ~5000 small tiles saved in .jxr format. – user3667217 Nov 12 '15 at 17:41

1 Answers1

4

I think your problem has to do with the way libvips calculates pixels.

In systems like OpenCV, images are huge areas of memory. You perform a series of operations, and each operation modifies a memory image in some way.

libvips is not like this, though the interface looks similar. In libvips, when you perform an operation on an image, you are actually just adding a new section to a pipeline. It's only when you finally connect the output to some sink (a file on disk, or a region of memory you want filled with image data, or an area of the display) that libvips will actually do any calculations. libvips will then use a recursive algorithm to run a large set of worker threads up and down the whole length of the pipeline, evaluating all of the operations you created at the same time.

To make an analogy with programming languages, systems like OpenCV are imperative, libvips is functional.

The good thing about the way libvips does things is that it can see the whole pipeline at once and it can optimise away most of the memory use and make good use of your CPU. The bad thing is that long sequences of operations can need large amounts of stack to evaluate (whereas with systems like OpenCV you are more likely to be bounded by image size). In particular, the recursive system used by libvips to evaluate means that pipeline length is limited by the C stack, about 2MB on many operating systems.

Here's a simple test program that does more or less what you are doing:

#!/usr/bin/python3

import sys
import pyvips

if len(sys.argv) < 4:
    print "usage: %s image-in image-out n" % sys.argv[0]
    print "   make an n x n grid of image-in"
    sys.exit(1)

tile = pyvips.Image.new_from_file(sys.argv[1])
outfile = sys.argv[2]
size = int(sys.argv[3])

img = pyvips.Image.black(size * tile.width, size * tile.height, bands=3)

for y in range(size):
    for x in range(size):
        img = img.insert(tile, x * size, y * size)

# we're not interested in huge files for this test, just write a small patch
img.crop(10, 10, 100, 100).write_to_file(outfile)

You run it like this:

time ./bigjoin.py ~/pics/k2.jpg out.tif 2
real    0m0.176s
user    0m0.144s
sys 0m0.031s

It loads k2.jpg (a 2k x 2k JPG image), repeats that image into a 2 x 2 grid, and saves a small part of it. This program will work well with very large images, try removing the crop and running as:

./bigjoin.py huge.tif out.tif[bigtiff] 10

and it'll copy the huge tiff image 100 times into a REALLY huge tiff file. It'll be quick and use little memory.

However, this program will become very unhappy with small images being copied many times. For example, on this machine (a Mac), I can run:

./bigjoin.py ~/pics/k2.jpg out.tif 26

But this fails:

./bigjoin.py ~/pics/k2.jpg out.tif 28
Bus error: 10

With a 28 x 28 output, that's 784 tiles. The way we've built the image, repeatedly inserting a single tile, that's a pipeline 784 operations long -- long enough to cause a stack overflow. On my Ubuntu laptop I can get pipelines up to about 2,900 operations long before it starts failing.

There's a simple way to fix this program: build a wide rather than a deep pipeline. Instead of inserting a single image each time, make a set of strips, then join the strips. Now the pipeline depth will be proportional to the square root of the number of tiles. For example:

img = pyvips.Image.black(size * tile.width, size * tile.height, bands=3)

for y in range(size):
    strip = pyvips.Image.black(size * tile.width, tile.height, bands=3)
    for x in range(size):
        strip = strip.insert(tile, x * size, 0)
    img = img.insert(strip, 0, y * size)

Now I can run:

./bigjoin2.py ~/pics/k2.jpg out.tif 200

Which is 40,000 images joined together.

jcupitt
  • 10,213
  • 2
  • 23
  • 39
  • Thanks for the response ! I figured I need to do the threading myself because these small images are actually in .jxr format. The work for each thread is : open a file, read a chunk in binary, save in .jxr, call JxrDecApp to decode and save in .tiff, call Vips to read into memory. I suppose this can't be handled by Vips itself ? – user3667217 Nov 11 '15 at 22:10
  • Can you get the jxr decoder to write the uncompressed byte array to memory? If you can, you could skip the tiff save / load and make a vips image directly. – jcupitt Nov 11 '15 at 22:41
  • That's my next step. Now, the openly available jxrlib only reads and writes files. I'll have to understand the jxr decoder and modify it so it can directly decode binary data into memory. But even if I do so, don't I have to handle the threading myself ? And still, I'm not sure what went wrong in my implementation I posted..? – user3667217 Nov 11 '15 at 22:44
  • Your example should work, and it sounds like it does, but you're still threading something that doesn't need to be threaded. I would make your source tiffs in parallel, but build the vips pipeline single-threaded. Also, use bigtiff: `canvas.write_to_file('complete.tif', bigtiff = True)`. – jcupitt Nov 11 '15 at 22:51
  • Preparing source tiff in parallel and reading in single-threaded sounds good. I'll certainly try that. But still, my issue is not resolved. As I posted in my question, when I tried to save part of the image, I got very strange result. Furthermore, canvas.extract_area seems to take a rather long time, but in my final implementation, I do need to save multiple parts of the large image – user3667217 Nov 12 '15 at 08:37
  • I updated my example, don't know if that helps. Try swapping the last line for `img.crop(20000,40000,2000,2000).write_to_file("complete.tif")`, it seems to work for me. Could you post a complete example of something that fails for you? – jcupitt Nov 12 '15 at 09:52
  • I've rewritten the answer, does that help? – jcupitt Nov 13 '15 at 10:42
  • 1
    I had a poke about in the code and found a way to slim down the stack use per operation. It was 1120 bytes per call, it's 512 with this patch: https://github.com/jcupitt/libvips/commit/3b32200cc112d55b5ebb87043bc4fcdd9af96223 ... thanks for raising this issue, it's a nice improvement. – jcupitt Nov 13 '15 at 14:55
  • Thanks for this detailed explanation ! It makes sense now to me now. I'll need some more help on the implementation, though. There is some overlap between the tiles, and it's somewhat irregular. If I break the large image into strips, I'll have to leave some overlap between strips so they can joined nicely. This means that the small tiles sitting on the border between the strips will have to be read twice. This certainly is not ideal. I have thousands of tiles to insert, how to break up the task nicely seems unclear to me... – user3667217 Nov 13 '15 at 16:34
  • Alternatively, is there anyway to check if there is a stack overflow ? Or to check if the tasks in pipeline have been cleared? If I can implement a check before sending more task into the pipeline, that would seem to solve the problem ? – user3667217 Nov 13 '15 at 16:36
  • Predicting stack overflow is not easy in C, unfortunately :( you'll probably have to render the image in sections. How large are each of the 5,000 tiles? Can you hold all the source images in memory? You could load the position of each image, sort by Y, partition into 10 (?) equal chunks, load the images for the top chunk and render that part, then throw away all the images you don't need for chunk 2, load any you are still missing, render chunk 2 ... and so on. – jcupitt Nov 13 '15 at 17:05
  • vips has operations that do soft, feathered joins, would they be useful? You'd need to be able to express your problem as a set of pairwise left-right or top-bottom joins. We should move this to chat, or switch to the vips issue tracker. – jcupitt Nov 13 '15 at 17:08