1

I have thousands of grayscale tiles of 256 x 256 pixels with dtype np.uint8 and I want to combine those into one BigTiff pyramidical image as fast as possible.

My current approach is to create a numpy array with the size of the final image, in which I paste all the tiles (This only takes a few seconds). For saving I have looked into multiple approaches.

1) Tifffile, using the imsave function, which turned out to be very slow, I would estimate over 10 minutes at least for a file that would end up at around 700MB

2) pyvips, by converting the massive numpy image to a pyvips image using pyvips.Image.new_from_memory, and then saving it using this:

vips_img.tiffsave(filename, tile=True, compression='lzw', bigtiff=True, pyramid=True, Q=80)

Constructing the vips_img takes ~42 seconds and saving it to disk takes another ~30, but this is all done using a single thread. I am wondering if there is any way to do this more time efficiently, either using a different method or leverage multithreading. High speed storage is available, so things could potentially be saved in a different format first or transferred to a different programming language if needed.

Just brainstorming: all the tiles come from an already existing BigTiff image and have been put through a preprocessing pipeline and now need to be saved again. I'm wondering if there could potentially be a way to copy the original file and replace data in there efficiently.

edit with more information:

The dimensions of the image are roughly 55k by 45k, but I would like to use this code for larger images too, up to 150k by 150k for example.

For the image of 55k by 45k and tiles of 256 by 256, we're talking about ~53k tiles. These tiles don't all contain information i'm interested in, so in the end I might end up with 50% of the tiles that I want to save again, the remained of the image can be black. Saving the processed in the same format seems the most convenient approach to me, as I would like to display it as an overlay

edit with intermediate solution

Earlier I mentioned that creating a pyvips image from a numpy array took 40 seconds. The cause of this was that my input was a transposed numpy array. The transpose operation itself is very fast, but I suspect it remained in memory as before, which caused a lot of cache misses when reading from it in transposed form.

So currently the following line takes 30 seconds (to write a 200MB file)

    vips_img.tiffsave(filename, tile=True, compression='lzw', bigtiff=True, pyramid=True, Q=80)

It would be nice if this could be faster, but it seems reasonable.

Code Example

In my case, only ~15% of the tiles is interesting and will be preprocessed. These are all over the image though. I would still like to save this in a gigapixel format, as that allows me to use openslide to retrieve parts of the image using their convenient library. In the example I just generated ~15% random data to simulate the percentage of black / information and the performance of the example is similar to the actual implementation where the data is more scattered over the image.

import numpy as np
import pyvips

def numpy2vips(a):
    dtype_to_format = {
    'uint8': 'uchar',
    'int8': 'char',
    'uint16': 'ushort',
    'int16': 'short',
    'uint32': 'uint',
    'int32': 'int',
    'float32': 'float',
    'float64': 'double',
    'complex64': 'complex',
    'complex128': 'dpcomplex',
    }
    height, width, bands = a.shape
    linear = a.reshape(width * height * bands)
    vi = pyvips.Image.new_from_memory(linear.data, width, height, bands,
                                      dtype_to_format[str(a.dtype)])
    return vi

left = np.random.randint(0, 256, (7500, 45000), np.uint8)
right = np.zeros((50000, 45000), np.uint8)
img = np.vstack((left, right))
vips_img = numpy2vips(np.expand_dims(img, axis=2))

start = time.time()
vips_img.tiffsave("t1", tile=True, compression='deflate', bigtiff=True, pyramid=True)
print("pyramid deflate took: ", time.time() - start)

start = time.time()
vips_img.tiffsave("t2", tile=True, compression='lzw', bigtiff=True, pyramid=True)
print("pyramid lzw took: ", time.time() - start)

start = time.time()
vips_img.tiffsave("t3", tile=True, compression='jpeg', bigtiff=True, pyramid=True)
print("pyramid jpg took: ", time.time() - start)

start = time.time()
vips_img.dzsave("t4", tile_size=256, depth='one', overlap=0, suffix='.jpg[Q=75]')
print("dzi took: ", time.time() - start)

output

pyramid deflate took:  32.69183301925659
pyramid lzw took:  32.10764741897583
pyramid jpg took:  59.79427194595337

I did not wait for the dzsave to finish, as it was taking over a couple of minutes.

duocorn
  • 43
  • 1
  • 8
  • Welcome to Stackoverflow! To help others achieving a better solution for your problem please share more information about your issue on your post, code snippets of what you have tried so far. – Gangula Apr 10 '20 at 10:29
  • Can you give some indication of the number of images please? And the pixel dimensions of the output file. Thank you. – Mark Setchell Apr 10 '20 at 11:11
  • Oh, also what OS do you use please? – Mark Setchell Apr 10 '20 at 11:29
  • MacOS on my laptop and linux for deployment – duocorn Apr 10 '20 at 11:35
  • 1
    It's hard to give specific advice without a specific benchmark. I would make a small, self-contained program which does the thing you want to go faster and then ask for comments. Trying to optimise without something to optimise is just guesses. You could try 1) use deflate, not lzw, and a libz with a SIMD path, 2) do the transpose in pyvips (two flips) and it should get parallelized, 3) move more of your processing to pyvips (its usually faster than numpy) 4) use dzi rather than a pyr tiff (you can write tiles in parallel) – jcupitt Apr 12 '20 at 04:36
  • Hi, thanks for your suggestions. I've posted an example in the original post, where I also timed your suggestions, but sadly they don't seem to work for this specific example. – duocorn Apr 13 '20 at 09:45

1 Answers1

1

I tried your test program on my laptop (ubuntu 19.10) and I see:

pyramid deflate took:  35.757954359054565
pyramid lzw took:  42.69455623626709
pyramid jpg took:  26.614688634872437
dzi took:  44.16632699966431

I'd guess you are not using libjpeg-turbo, the SIMD libjpeg fork. Unfortunately it's very difficult to install on macOS, due to brew being stuck on the non-SIMD version, but it should be easy on your deployment system, just install the libjpeg-turbo package instead of libjpeg (they are binary compatible).

There are various similar projects for zlib that should speed up deflate compression dramatically.

jcupitt
  • 10,213
  • 2
  • 23
  • 39
  • Thank you for testing, I'll definitely look into libjpeg-turbo. I just noticed that the output file contains 10 different zoom levels, which is more than I require and I wonder if reducing that would allow for faster saving. In the docs I noticed a depth argument for tiffsave, but when using it in python or using the vips command line tool, the depth argument is not known. ` vips_img.tiffsave("depth3.tif", tile=True, compression='deflate', bigtiff=True, pyramid=True, depth=3)` gives a key error for depth. Would you know if there is another a way to save less layers in the pyramid? – duocorn Apr 15 '20 at 08:53
  • The top few pyramid layers are tiny, I don't think they add any significant amount of processing time. `depth` was added in libvips 8.10, which is not out yet. It lets you select shrink-to-one-pixel, shrink-to-one-tile, only-one-layer. – jcupitt Apr 15 '20 at 10:20
  • I'm wondering if you would still know a way to circumvent creating them, or how to delete layers from the pyramid. My source image only has 3 zoom levels (increments in powers of 4 and skipping high zoom levels), but using vips, the same sized image has 10 zoom levels. For my specific use-case it would be very convenient if the zoom levels and number of zoom levels are identical to the source image. – duocorn May 06 '20 at 11:49