I have a reasonably large 1000 x 4000 pixel xr.DataArray
returned from an OpenDataCube query, and a large set (> 200,000) of xy
point values. I need to sample the array to return a value under each xy
point, and return interpolated values (e.g. if the point lands halfway between a 0
and a 1.0
pixel, the value returned should be 0.5
).
xr.interp
lets me easily sample interpolated values, but it returns a huge matrix of every combination of all the x
and y
values, rather than just the values for each xy
point itself. I've tried using np.diagonal
to extract just the xy
point values, but this is slow, very quickly runs into memory issues and feels inefficient given I still need to wait for every combination of values to be interpolated via xr.interp
.
Reproducible example
(using just 10,000 sample points (ideally, I need something that can scale to > 200,000 or more):
# Create sample array
width, height = 1000, 4000
val_array = xr.DataArray(data=np.random.randint(0, 10, size=(height, width)).astype(np.float32),
coords={'x': np.linspace(3000, 5000, width),
'y': np.linspace(-3000, -5000, height)}, dims=['y', 'x'])
# Create sample points
n = 10000
x_points = np.random.randint(3000, 5000, size=n)
y_points = np.random.randint(-5000, -3000, size=n)
Current approach
%%timeit
# ATTEMPT 1
np.diagonal(val_array.interp(x=x_points, y=y_points).squeeze().values)
32.6 s ± 1.01 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
Does anyone know of a faster or more memory efficient way to achieve this?