3

I have an application where I am loading raster data into a Dask array and then I only need to process the chunks which overlap with some region of interest. I know that I can create a Dask masked array, but I am looking for a way to prevent certain chunks from being processed at all – as some of the ROIs contain multiple polygons which are very far apart and thus 90% of the chunks will be discarded in the end.

A simple example would be, as below, where arr2 contains no information at all, but is needed for alignment of the other chunks.

import numpy as np
import dask.array as da

arr0 = da.from_array(np.arange(1, 26).reshape(5,5), chunks=(5, 5))
arr1 = da.from_array(np.arange(25, 50).reshape(5,5), chunks=(5, 5))
arr2 = da.from_array(np.zeros((5,5)), chunks=(5, 5)) 
arr3 = da.from_array(np.arange(75, 100).reshape(5,5), chunks=(5, 5))

a = da.block([[arr0, arr1],[arr2, arr3]])
b = da.ma.masked_equal(a, 0)
c = da.min(b)
c.visualize()

We can see by plotting the graph that arr2 is still in the computational graph, furthermore, it is taking up memory as it will still be evaluated even though it is masked. What I'd like to achieve is a way to mask the entire chunk/block such that it is just ignored in computation all together.enter image description here

System123
  • 523
  • 5
  • 14
  • 1
    Can you provide some more details / code? E.g. _how_ do you load the data using what libraries? – Val May 06 '21 at 07:43
  • How I'm loading the data is irrelevant to the question, I have added some more details about the code – System123 May 13 '21 at 06:55
  • I think you oughta let the people who want to help you decide what's relevant for them to understand the problem. For instance, understanding what the underlying data is and how it is loaded can be interesting for chunking – Val May 13 '21 at 11:05
  • In that case, the data is geospatial raster data (proprietary format) loaded in a block-wise manner where each block/spatial region directly maps to a chunk in dask as chunks have different projections. Null chunks are needed in order to ensure accurate geospatial alignment. – System123 May 14 '21 at 12:40

0 Answers0