3

I am using xarray/rasterio to do some operations on a set of GeoTiff files. The data that I am dealing with contain missing values (-9999) at some grid points. Is there any efficient pythonic way to replace these values with interpolated data using xarray?

For example I have something like:

>>> da = xr.DataArray([[1.0,2.0,3.0],[4.0,-999.0,6.0],[7.0,-999.0,9.0]],[('x',[1,2,3]),('y',[1,2,3])])
>>> da
<xarray.DataArray (x: 3, y: 3)>
array([[   1.,    2.,    3.],
       [   4., -9999.,    6.],
       [   7., -9999.,    9.]])
Coordinates:
  * x        (x) int64 1 2 3
  * y        (y) int64 1 2 3

and I want to replace -9999 values with the interpolated values based on adjacent grid points. Any hint would be appreciated!

Monobakht
  • 193
  • 1
  • 11

3 Answers3

1

You can use da.interpolate_na() to do one dimensional interpolation of NA values - I am unsure if there is a two dimensional method.

You can convert your -999 values to NaN's and then interpolate.

da = xr.DataArray([[1.0,2.0,3.0],[4.0,-999.0,6.0],[7.0,-999.0,9.0]],[('x',[1,2,3]),('y',[1,2,3])])
da

<xarray.DataArray (x: 3, y: 3)>
array([[   1.,    2.,    3.],
       [   4., -999.,    6.],
       [   7., -999.,    9.]])
Coordinates:
  * x        (x) int64 1 2 3
  * y        (y) int64 1 2 3

new_da = da.where(da != -999.0, np.nan)
new_da

<xarray.DataArray (x: 3, y: 3)>
array([[ 1.,  2.,  3.],
       [ 4., nan,  6.],
       [ 7., nan,  9.]])
Coordinates:
  * x        (x) int64 1 2 3
  * y        (y) int64 1 2 3

new_da.interpolate_na(dim=('y'), method='linear')


<xarray.DataArray (x: 3, y: 3)>
array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])
Coordinates:
  * x        (x) int64 1 2 3
  * y        (y) int64 1 2 3

Since it's a 1-D interpolation, you can just interpolate for each dimension separately.

bwc
  • 1,028
  • 7
  • 18
1

For 2D interpolation you could use rioxarray.

MuellerSeb
  • 796
  • 6
  • 11
0

I still do not think there is an efficient way to do this in xarray. The suggestion by @bwc will not work properly for multidimensional arrays, because the first pass along the first dimension will already fill all gaps. This may be acceptable if your missing values are just single datapoints, but it will result in unwanted behaviour if you have larger gaps.

The simplest solution I'm aware of for your problem is using the setmisstonn or setmisstodis operator in CDO (depending on how you want to interpolate).