Add nan buffer to xarray dataset

Question

I have an xarray Dataset which will be acting as a mask to a different dataset. I'd like to create a buffer (of a configurable distance) from any nan values in the mask. I haven't seen anything that adds a buffer internally, instead of expanding the array size with padded values. Below is some reproducible code to show what I mean (the datasets I'm using have 10,000s of x/y coordinates):

import numpy as np
import xarray as xr

data = [[ 0.,  1.,  2.,  3., nan],
       [ 0.,  6.,  4., nan, nan],
       [ 4.,  3.,  6.,  4., nan],
       [ 1.,  0.,  3.,  4., nan]]
y = [0, 1, 2, 3]
x = [0, 1, 2, 3, 4]
test = xr.Dataset({'band': xr.DataArray(data, coords=[y, x], dims=['y', 'x'])})

I'd like to create a dataset where if I supplied a distance of 1, the above would look like this:

[[ 0.,  1.,  2.,  nan., nan],
[ 0.,  6.,  nan., nan, nan],
[ 4.,  3.,  6.,  nan., nan],
[ 1.,  0.,  3.,  nan., nan]])

And ideally would be able to have a configurable buffer distance that could be set. I've tried to do this via downsampling the image and then upsampling the downsampled image but it was very slow and a struggle to get to work properly so thought I'd see if I'm missing a better option.

jhamman · Answer 1 · 2023-01-06T23:22:51.813

0

You can combine Xarray's shift and where methods to achieve this behavior:

buffer = -1
mask = test.shift({'x': buffer}).notnull()
test.where(mask)

This will produce a band variable that looks like:

<xarray.DataArray 'band' (y: 4, x: 5)>
array([[ 0.,  1.,  2., nan, nan],
       [ 0.,  6., nan, nan, nan],
       [ 4.,  3.,  6., nan, nan],
       [ 1.,  0.,  3., nan, nan]])
Coordinates:
  * y        (y) int64 0 1 2 3
  * x        (x) int64 0 1 2 3 4

Edit 1:

If you only want to buffer from one edge, this method may work:

mask = test.band.isnull().astype('f8')
mask2 = ~mask.where(mask).bfill(dim='x', limit=buffer).fillna(0).astype(bool)
test.band.where(mask2)

edited Jan 06 '23 at 23:22

answered Jan 06 '23 at 16:37

jhamman

5,867
19
39

1

I don't think the pad function does what I want it to do - it is adding nan values to the start and end of the x dimension, rather than maintaining the same size x dimension and changing next neighbour values to nans – JackLidge Jan 06 '23 at 17:03
Good point. Take a look at the updated answer! – jhamman Jan 06 '23 at 17:44
As I've thought about this a bit more. I'm not 100% sure this is what you want. Do you always want to "push" nans from one "edge" to the other? I'm thinking there may be better ways to do this using a rolling window or using the `bottleneck.push`. – jhamman Jan 06 '23 at 23:00

Add nan buffer to xarray dataset

1 Answers1