2

I am working with climate data derived from a netCDF file. Data from different models come at different resolutions - because of this, it is necessary to "regrid" the models to a common grid resolution. The data is 3-D (time, latitude, longitude). To regrid, I linearly interpolate the old grid onto a new grid at each timestep.

I am looking for a way to improve the efficiency of process which loops through each timestep, because scipy.interpolate.interp2d can only handle two dimensions at a time.

Is there any way to efficiently linearly regrid / interpolate across two dimensions for a time series, without doing a for loop (as shown below)?

import numpy as np
import xarray as xr

#create xarray DataArray to establish resolution we want to regrid to.
ref_lat = np.linspace(-90,89,180)
ref_lon = np.linspace(0,359,360)
ref_grid = xr.DataArray(np.zeros((180,360)),coords=[('lat',ref_lat),('lon',ref_lon)])
x_new = ref_grid.lon
y_new = ref_grid.lat

#original files and dimension
x_old = original_DataArray.lon
y_old = original_DataArray.lat
z_old = original_file #3-D memmap

fout = np.memmap('file_out',dtype='float32',mode='w+',shape=original_file.shape)
#any way to optimize this part??
for t in range(0,original_file.shape[0]):
    f = interpolate.interp2d(x_old,y_old,z_old[t,:,:])
    fout[t,:,:] = f(x_new,y_new)
fout.flush()

*Note: I use numpy memmaps to write the files to and from disk, as they are often to large to be handled in memory, and xarray DataArrays to handle the netCDF files.

csg2136
  • 235
  • 4
  • 10

3 Answers3

0

I don't think it's possible to avoid the loop, but since your grids are regular in (lon, lat) you could use the faster RectBivariateSpline as suggested by the documentation.

Furthermore, xarray adds an additional layer of indexing which is not used by your code and which may slow things down unnecessarily. In particular, if your data is stored in a file xarray might read it out only when asked to.

For example, you might do:

#original files and dimension
x_old = original_DataArray.lon.values
y_old = original_DataArray.lat.values
z_old = original_file.load().values

and also replace your indexing loop with:

for t, z in enumerate(z_old):
    # z is your 2d array, not sure if this loop is much faster though
Fabzi
  • 626
  • 5
  • 15
0

Also, fyi, you could use ESMpy to make this much faster.

Solomon Vimal
  • 920
  • 12
  • 27
0

There is a regrid() method in the CDAT tool that helps interpolating onto a new grid without looping through time slices manually. You can give the function the target grid you want to interpolate onto:

var1_regrid = var1.regrid(ref_var.getGrid())
var2_regrid = var2.regrid(ref_var.getGrid())
var3_regrid = var3.regrid(ref_var.getGrid())

So they can be put into computations like:

diff = var1_regrid - ref_var
Jason
  • 2,950
  • 2
  • 30
  • 50