Regrid Netcdf file in python

Question

I'm trying to regrid a NetCDF file from 0.125 degrees to 0.083-degree spatial scale. The netcdf contains 224 latitudes and 464 longitudes and it has daily data for one year.

I tried xarray for it but it produces this memory error: MemoryError: Unable to allocate 103. GiB for an array with shape (13858233841,) and data type float64

How can I regrid the file with python?

Can you include a self-contained example of code you've tried that doesn't work? https://stackoverflow.com/help/minimal-reproducible-example — DopplerShift, May 18 '20 at 14:57
Cau you post `ncinfo xx.nc` please ? Can you regrid to a coarser grid ? (netcdf4 files are afaik hdf5 files, which can be *really* messy -- `install h5py; h5stat xx.nc`) — denis, Aug 12 '23 at 10:00

score 5 · Answer 1 · answered May 22 '20 at 10:46

Another option is try cf-python, which can (in general) regrid larger-than-memory datasets in both spherical polar coordinates and Cartesian coordinates. It uses the ESMF regridding engine to do this, so linear, first and second-order conservative, nearest neighbour, etc. regridding methods are available.

Here is an example of the kind of regridding that you need:

import cf
import numpy

f = cf.example_field(2) # Use cf.read to read your own data

print('Source field:')
print(f)

# Define the output grid
lat = cf.DimensionCoordinate(
           data=cf.Data(numpy.arange(-90, 90.01, 0.083), 'degreesN'))
lon = cf.DimensionCoordinate(
          data=cf.Data(numpy.arange(0, 360, 0.083), 'degreesE'))

# Regrid the field
g = f.regrids({'latitude': lat, 'longitude': lon}, method='linear')

print('\nRegridded field:')
print(g)

which produces:

Source field:
Field: air_potential_temperature (ncvar%air_potential_temperature)
------------------------------------------------------------------
Data            : air_potential_temperature(time(36), latitude(5), longitude(8)) K
Cell methods    : area: mean
Dimension coords: time(36) = [1959-12-16 12:00:00, ..., 1962-11-16 00:00:00]
                : latitude(5) = [-75.0, ..., 75.0] degrees_north
                : longitude(8) = [22.5, ..., 337.5] degrees_east
                : air_pressure(1) = [850.0] hPa

Regridded field:
Field: air_potential_temperature (ncvar%air_potential_temperature)
------------------------------------------------------------------
Data            : air_potential_temperature(time(36), latitude(2169), longitude(4338)) K
Cell methods    : area: mean
Dimension coords: time(36) = [1959-12-16 12:00:00, ..., 1962-11-16 00:00:00]
                : latitude(2169) = [-90.0, ..., 89.94399999999655] degreesN
                : longitude(4338) = [0.0, ..., 359.971] degreesE
                : air_pressure(1) = [850.0] hPa

There are plenty of options to get the destination grid from other fields, as well as defining it explicitly. More details can be found in the documentation

cf-python will infer which axes are X and Y, etc from the CF metadata attached to the dataset, but if that is missing then there are always ways to manually set it or work around it.

score 5 · Accepted Answer · edited Mar 03 '22 at 20:54

A Python option, using CDO as a backend, is my package nctoolkit: https://nctoolkit.readthedocs.io/en/latest/, instalable via pip (https://pypi.org/project/nctoolkit/)

It has a built in method called to_latlon which will regrid to a specified latlon grid

In your case, you would need to do:

import nctoolkit as nc
data = nc.open_data(infile)
data.to_latlon(lon = [lon_min,lon_max],lat=[lat_min,lat_max], res =[0.083, 0.083])

score 4 · Answer 3 · answered May 18 '20 at 12:52

Xarray uses something called 'lazy loading' to try and avoid using too much memory. Somewhere in your code, you are using a command which loads the entirety of the data into memory, which it cannot do. Instead, you should specify the calculation, then save the result directly to file. Xarray will perform the calculation a chunk at a time without loading everything into memory.

An example of regridding might look something like this:

da_input = open_dataarray(
    'input.nc') # the file the data will be loaded from
regrid_axis = np.arange(-90, 90, 0.125) # new coordinates
da_output = da_input.interp(lat=regrid_axis) # specify calculation
da_ouput.to_netcdf('output.nc') # save direct to file

Doing da_input.load(), or da_output.compute(), for example, would cause all the data to be loaded into memory - which you want to avoid.

This seems to depend strongly on `chunks=` which has default `None`; see [performance-of-chunking-in-xarray-dask-when-opening-and-re-chunking-a-dataset](https://stackoverflow.com/questions/58838873/performance-of-chunking-in-xarray-dask-when-opening-and-re-chunking-a-dataset) — denis, Jul 20 '23 at 14:58

score 4 · Answer 4 · edited Jun 04 '22 at 06:33

4

The easiest way to do this is to use operators like cdo and nco.

For example:

cdo remapbil,target_grid infile.nc ofile.nc

The target_grid can be a descriptor file or you can use a NetCDF file with your desired grid resolution. Take note of other regridding methods that might suit your need. The example above is using bilinear interpolation.

edited Jun 04 '22 at 06:33

ClimateUnboxed

7,106
3
41
86

answered May 20 '20 at 11:14

Dani56

372
1
2
7

ClimateUnboxed · Answer 5 · 2022-03-15T11:11:16.863

Another way to access the cdo functionality from within python is to make use of the Pypi cdo project:

pip install cdo

Then you can do

from cdo import Cdo
cdo=Cdo()
cdo.remapbil("target_grid",input="in.nc",output="out.nc")

where target_grid is your usual list of options

a nc file to use the grid from
a regular grid specifier e.g. r360x180
a txt file with a grid descriptor (see below)

There are several methods built in for the regridding:

remapbic : bicubic interpolation
remapbil : bilinear interpolation
remapnn : nearest neighbour interpolation
remapcon : first order conservative remapping
remapcon2 : 2nd order conservative remapping

You can use a grid descriptor file to define the area you need to interpolate to...

in the file grid.txt

gridtype=lonlat
xfirst=X   (here X is the longitude of the left hand point)
xinc=0.083
xsize=NX   (here put the number of points in domain)
yfirst=Y
yinc=0.083
ysize=NY

For more details you can refer to my video guide on interpolation.

Can cdo regrid unstructured/irregular (with spatial resolution changing depending on the location) data? — Henrique, Sep 21 '22 at 12:20
yes, cdo can handle a range of irregular and unstructured grids, for more details see here: https://code.mpimet.mpg.de/projects/cdo/embedded/index.html#x1-200001.5 — ClimateUnboxed, Sep 21 '22 at 13:05

Regrid Netcdf file in python

5 Answers5

Linked