2

I have a Xarray that has 3 dimensions (time, x and y) - they are basically a stack of images. I want to operate in a "windowed" manner on pairs of images.

Basically take a pair, say the first two "time" images, and call a function on a 5x5 window over them. In numpy I would do this by simply slicing appropriately and computing my metric by first forming "patches":

params = []
for i in range(0,patch1.shape[0],1):
    for j in range(0,patch1.shape[1],1):
        window1 = np.copy(imga[i:i+N,j:j+N]).flatten()
        window2 = np.copy(imgb[i:i+N,j:j+N]).flatten()
        params.append((window1, window2))

Here N is the window size, 5 for instance. Then computing my metric:

def f(param):
    return metric_function(*param)

with Pool(4) as p:
    r = list(tqdm.tqdm(p.imap(f,params), total=len(params)))

However, I have difficultly translating this to dask and I would like some help. My first intuition is to use the map_overlap function, but I don't think I fully understand how to, especially because the output is of a different dimension from the input; i.e. the output would only be the center pixel of the whole NxN patch.

shaunakde
  • 3,009
  • 3
  • 24
  • 39
  • 1
    Sounds like what you're trying to do is a stencil, and https://stackoverflow.com/questions/40117237/how-to-programm-a-stencil-with-dask might help? – Dominik Stańczak Jan 12 '21 at 10:48
  • It's close! And definitely helps me get some part of the way. I think via experimentation, I have managed to come up with a solution. Let me post that here as an answer, as I wait for more responses! – shaunakde Jan 12 '21 at 18:51
  • 1
    also consider `map_overlap` https://docs.dask.org/en/latest/array-overlap.html#dask.array.map_overlap – mdurant Jan 14 '21 at 20:37
  • @mdurant With a combination of map_overlap and `skimage.util.shape.view_as_windows` (and looping over the blocks) I was able to achieve my objective in a reasonably fast way. I'll update my answer below to refect that! – shaunakde Jan 15 '21 at 03:36

1 Answers1

0

With a little experimentation, I was able to come up with a way to perform this operation.

I re-wrote my distance metric to be dask compatible (i.e. added range parameter to histogram since dask.array.histogram expects that, while numpy doesn't). This is the distance metric:

def cauchy_schwartz(chunk):
    
    imga = chunk[0]
    imgb = chunk[1]
    
    p, _ = np.histogram(np.ravel(imga), bins=20, range=[imga.min(),imgb.max()])
    p = p/np.sum(p)
    q, _ = np.histogram(np.ravel(imgb), bins=20, range=[imgb.min(),imgb.max()])
    q = q/np.sum(q)

    n_d = np.array(da.sum(p * q)) 
    d_d = np.sqrt(np.sum(np.power(p, 2)) * np.sum(np.power(q, 2)))
    return np.array([-1.0 * np.log10( n_d/ d_d)])[None, None, None]

The key here was to return as an array with shape (1,1,1) as explained later.

Then I simply chunked my data to the desired window size (`9 in this case).

dcube2 = dcube.chunk((2,9,9)).persist()

Datacube chunked to window size

And I was able to process over the 9x9 window as such:

output = da.map_blocks(cauchy_schwartz, dcube2.data, chunks=(1,1,1), dtype='float64').compute()

Output

I am experimenting with map_blocks and the depth parameter to do this in an overlapping manner.

This might not be the most efficient solution, but its the best I could come up with.

shaunakde
  • 3,009
  • 3
  • 24
  • 39