Improve performance of Dask array calculating a stencil

Question

From this and this question, it seems that the Dask-ish way to program a stecil is using dask.array.map_blocks with dask.array.ghost or dask.array.map_overlap. So I have the following code:

def local_stencil(block):
    block[1:-1,1:-1,1:-1] = ( block[1:-1,1:-1,1:-1] + block[0:-2,1:-1,1:-1] + block[2:,1:-1,1:-1] + block[1:-1,0:-2,1:-1] + block[1:-1,2:,1:-1] + block[1:-1,1:-1,0:-2] + block[1:-1,1:-1,2:]) / 7.0;
    return block

def stencil(grid, iterations, workers):
    g = da.ghost.ghost(grid, depth={0:1, 1:1, 2:1}, boundary={0:0, 1:1, 2:0})
    chunk_size = int(math.pow( g.shape[0]**3/workers, 1/3))
    for iteration in range(iterations):
        g = da.from_array( g.map_blocks(local_stencil).compute(), chunks=chunk_size)
    return da.ghost.trim_internal(g, {0:1, 1:1, 2:1})

I don't know why when compared to a numpy version of the same stencil function (local_stencil) it performs considerably worse:

Are there any changes I can make to my code to improve performance?

From this other answer, I understand how Dask is helpful when you have a file larger than RAM memory, but could Dask also be helpful in compute-bound operations like matrix multiplications or convolution operations?

I see that you do `da.from_array(arr.compute())` repeatedly: this is an anti-pattern, because you pull the whole dataset into the local thread, and then send the data out again. Will `g = g.map_blocks(..)` work? — mdurant, Apr 20 '17 at 18:06
@mdurant, I changed `g = da.from_array( g.map_blocks(local_stencil).compute(), chunks=chunk_size)` to simply `g = g.map_blocks(local_stencil)` and actually it runs faster, but outputs a different result. — Guillermo Cornejo Suárez, Apr 20 '17 at 19:46
You may find this interesting https://github.com/dask/dask/issues/2244 — mdurant, Apr 22 '17 at 19:41

Improve performance of Dask array calculating a stencil

0 Answers0