0

I am trying to see if i can use Dask for blockwise parallelization of the detection and segmentation of objects in massive 2D images (~20-50 GB) on a cluster.

My logic to detect/segment objects in an image block will be encapsulated in a function.

I came across a Dask function called map_blocks that lets me apply a custom function on each block/chunk of a dask array.

However, i see that the output type of the function i can pass to map_blocks should also be an array.

For object detection/segmentation, i would want my function to be able to return the coordinates of the bounding contour of each object found/detected in the block. Note that the number of objects in any block is unknown and depends on the image.

How can i solve this use case with map_blocks or something else in Dask?

cdeepakroy
  • 2,203
  • 3
  • 19
  • 23

2 Answers2

1

For more custom computations I recommend using dask.delayed which lets you parallelize fairly generic Python code.

If you have a dask.array you can turn it into a bunch of delayed objects with the .to_delayed() method

blocks = x.to_delayed()

You can then run arbitrary functions on these blocks however you like.

@dask.delayed
def process_block(block):
    ...

blocks = [[process_block(block) for block in row]
          for row in x.to_delayed().tolist()]
MRocklin
  • 55,641
  • 23
  • 163
  • 235
  • Thanks for the answer. In the solution you suggest, is process_block free to return any type of object? And is it possible for the process_block function to know the block_id? – cdeepakroy Nov 22 '16 at 19:44
  • block_id would be needed so i can have my object detection function generate the bounding contour coords of the detected objects in global space instead of block space. – cdeepakroy Nov 22 '16 at 19:46
  • @MRocklin can you elaborate on this one as well? https://stackoverflow.com/questions/56586748/generating-batches-of-images-in-dask – enterML Jun 14 '19 at 10:45
1

You could use an object array as output, with a chunkshape of (1,1). Be sure to add "dtype='object'" to your map_blocks call. Inside the mapped function, you then instantiate a (1,1) sized object array with a list of coordinates at (0,0). Like this:

def find_objects():
    # do logic
    result = np.empty((1,1), dtype='object')
    result[0,0] = coordinate_list
    return result

da_coords = da.map_blocks(find_objects, da_image, dtype='object')