How to programm a stencil with Dask

Question

In many occasions, scientists simulates a system's dynamics using a Stencil, this is convolving a mathematical operator over a grid. Commonly, this operation consumes a lot of computational resources. Here is a good explanation of the idea.

In numpy, the canonical way of programming a 2D 5-points stencil is as follows:

for i in range(rows):
    for j in range(cols):
        grid[i, j] = ( grid[i,j] + grid[i-1,j] + grid[i+1,j] + grid[i,j-1] + grid[i,j+1]) / 5

Or, more efficiently, using slicing:

grid[1:-1,1:-1] = ( grid[1:-1,1:-1] + grid[0:-2,1:-1] + grid[2:,1:-1] + grid[1:-1,0:-2] + grid[1:-1,2:] ) / 5

However, if your grid is really big, it won't fix in your memory, or if the convolution operation is really complicated it will take a very long time, parallel programing techniques are use to overcome this problems or simply to get the result faster. Tools like Dask allow scientist to program this simulations by themselves, in a parallel-almost-transparent manner. Currently, Dask doesn't support item assignment, so, how can I program a stencil with Dask.

score 2 · Accepted Answer · answered Oct 18 '16 at 20:28

Nice question. You're correct that dask.array do provide parallel computing but don't doesn't support item assignment. We can solve stencil computations by making a function to operate on a block of numpy data at a time and then by mapping that function across our array with slightly overlapping boundaries.

Pure Functions

You should make a function that takes a numpy array and returns a new numpy array with the stencil applied. This should not modify the original array.

def apply_stencil(x):
    out = np.empty_like(x)
    ...  # do arbitrary computations on out    
    return out

Map a function with overlapping regions

Dask arrays operate in parallel by breaking an array into disjoint chunks of smaller arrays. Operations like stencil computations will require a bit of overlap between neighboring blocks. Fortunately this can be handled with the dask.array.ghost module, and the dask.array.map_overlap method in particular.

Actually, the example in the map_overlap docstring is a 1d forward finite difference computation

>>> x = np.array([1, 1, 2, 3, 3, 3, 2, 1, 1])
>>> x = from_array(x, chunks=5)
>>> def derivative(x):
...     return x - np.roll(x, 1)
>>> y = x.map_overlap(derivative, depth=1, boundary=0)
>>> y.compute()
array([ 1,  0,  1,  1,  0,  0, -1, -1,  0])

score 2 · Answer 2 · answered Oct 18 '16 at 21:12

Dask internally divides arrays into smaller numpy arays, when you create an array with dask.array, you must provide some information about how to divide it into chunks, like this:

grid = dask.array.zeros((100,100), chunks=(50,50))

That requests an array of 100 x 100 divided in 4 chunks. Now, to convolve an operation over the newly created array, information of chunk's borders must be shared. Dask ghost cells, manages situations like this.

A common work-flow implies:

Creating the array (if it didn't exist before)
Commanding the creation of ghost cells
Mapping a calculation
Trimming the borders

For example,

import dask.array as da
grid = da.zeros((100,100), chunks=(50,50))
g = da.ghost.ghost(grid, depth={0:1,1:1}, boundary={0:0,1:1})
g2 = g.map_blocks( some_function ) 
s = da.ghost.trim_internals(g2, {0:1,1:1})
s.compute()

Remember that Dask creates a dictionary to represent the task graph, the real computation is triggered by s.compute(). As noted by MRocklin, the mapped function must return a numpy array.

A note about schedulers

By default, dask.array uses dask.theated scheduler to boost performance, but once the information is shared, problems similar to a stencil are embarrassing parallel, that means no resources nor information must be shared, and computations can be mapped to different cores even different computers. To do this, one can instruct dask to use a different scheduler, for example dask.multiprocessing:

import dask.multiprocessing
import dask

dask.set_options(get=dask.multiprocessing.get)

When compute() is triggered, Dask will create multiple instances of python, if your applications is big enough to pay the overhead of creating this new instances, maybe dask.multiprocessing will deliver a better performance. More information about Dask schedulers can be found here.

An special acknowledgment to Matthew Rocklin, that first answered this at the Dask google group. — Guillermo Cornejo Suárez, Oct 18 '16 at 21:22

How to programm a stencil with Dask

2 Answers2

Pure Functions

Map a function with overlapping regions

A note about schedulers

Linked