1

I’m exploring 3D interactive volume convolution with some simple stencils using dask right now. Let me explain what I mean:

  • Assume that you have a 3D data which you would like to process through Sobel Transform (for example to get L1 or L2 gradient).
  • Then you divide your input 3D image into subvolumes (with some overlapping boundaries – for 3x3x3 stencil Sobel it will demand +2 samples overlap/padding)
  • Now let’s assume that you create a delayed computation of the Sobel 3D transform on entire 3D volume – but not executing it yet.

And now the most important part:

  • I want to write a function which will extract some particular 2D section from the virtually transformed data.
  • And then finally let dask everything to compute:
    • But what I need dask to do is not to compute the entire transform for me and then provide a section.
      • I need it to execute only those tasks which are needed to compute that particular 2D transformed image slice.

Do you think – it’s possible?

In order to explain it with image – please consider this to be a 3D domain decomposition (this is from DWT – but good for illustration from here):

illistration of domain decomposition

And assume that there is a function which computes 3D transform of the entire volume using dask. But what I would like to get – for example – is 2D image of the transformed 3D data which consists from LLL1,LLH1,HLH1,HLL1 planes (essentially a single slice).

The important part is not to compute the whole subcubes – but let dask somehow automatically track the dependencies in the compute graph and evaluate only those.

Please don’t worry about compute v.s. copy time. Assume that it has perfect ratio.

Let me know if more clarification is needed! Thanks for your help!

THOTH
  • 109
  • 1
  • 12

2 Answers2

2

I'm hearing a few questions. I'll answer each individually

  • Can Dask track which tasks are required for a subset of outputs and only compute those?

Yes. Lazy Dask operations produce a dependency graph. In the case of dask.arrays this graph is per-chunk. If your output only depends on a subset of the graph then Dask will remove tasks that are not necessary. The in-depth docs for this are here and the cull optimization in particular.

As an example consider this 100,000 by 100,000 array

>>> x = da.random.random((100000, 100000), chunks=(1000, 1000))

And lets say that I add a couple of 1d slices from it

>>> y = x[5000, :] + x[:, 5000].T

The resulting optimized graph is only large enough to compute the output

>>> graph = y._optimize(y.dask, y._keys())  # you don't need to do this
>>> len(graph)                              # it happens automatically
301

And we can compute the result quite quickly:

In [8]: %time y.compute()
CPU times: user 3.18 s, sys: 120 ms, total: 3.3 s
Wall time: 936 ms
Out[8]: 
array([ 1.59069994,  0.84731881,  1.86923216, ...,  0.45040813,
        0.86290539,  0.91143427])

Now, this wasn't perfect. It did have to produce all of the 1000x1000 chunks that our two slices touched. But you can control the granularity there.

Short answer: Dask will automatically inspect the graph and only run those tasks that are necessary to evaluate the output. You don't need to do anything special to do this.

  • Is it a good idea to do overlapping array computations with dask.array?

Maybe. The relevant doc page is here on Overlapping Blocks with Ghost Cells. Dask.array has convenience functions to make this easy to write down. However it will create in-memory copies. Many people in your position find memcopy too slow. Dask generally doesn't support in-place computation so we can't be as efficient as proper MPI code. I'll leave the performance question here to you though.

MRocklin
  • 55,641
  • 23
  • 163
  • 235
1

Not to detract from the nicely laid out answer from @MRocklin, but more to add to it.

I also regularly find myself needing to do things like edge detection and other image processing techniques on large scale array data. As Dask is a very nice library for constructing and exploring such computational workflows on large array data, have put together some utility libraries for some common image processing techniques in a GitHub organization called dask-image. They have largely been designed to mimic SciPy's ndimage API.

As to using a Sobel operator with Dask, one can use this sobel function from dask-ndfilters (permissively licensed) to perform this operation on a Dask Array. It handles proper haloing on the blocks underneath the hood returning a new Dask Array.

As SciPy's sobel function (and dask-ndfilters' sobel as well) operate on one dimension, one will need to map over each axis and stack to get the full Sobel operator result. That said, this is quite straightforward to do. Below is a brief snippet showing how to do this on a random Dask Array. Also included is taking a slice along the XZ-plane. Though one could just as easily take any other slice or perform additional operations on the data.

Hope this helps. :)

import dask.array as da
import dask_ndfilters as da_ndfilt

d = da.random.random((100, 120, 140), chunks=(25, 30, 35))
ds = da.stack([da_ndfilt.sobel(d, axis=i) for i in range(d.ndim)])
dsp = ds[:, :, 0, :]
asp = dsp.compute()
jakirkham
  • 685
  • 5
  • 18