2

I'm trying to regrid a NetCDF file from 0.125 degrees to 0.083-degree spatial scale. The netcdf contains 224 latitudes and 464 longitudes and it has daily data for one year.

I tried xarray for it but it produces this memory error: MemoryError: Unable to allocate 103. GiB for an array with shape (13858233841,) and data type float64

How can I regrid the file with python?

lsr729
  • 752
  • 2
  • 11
  • 25
  • Can you include a self-contained example of code you've tried that doesn't work? https://stackoverflow.com/help/minimal-reproducible-example – DopplerShift May 18 '20 at 14:57
  • Cau you post `ncinfo xx.nc` please ? Can you regrid to a coarser grid ? (netcdf4 files are afaik hdf5 files, which can be *really* messy -- `install h5py; h5stat xx.nc`) – denis Aug 12 '23 at 10:00

5 Answers5

5

Another option is try cf-python, which can (in general) regrid larger-than-memory datasets in both spherical polar coordinates and Cartesian coordinates. It uses the ESMF regridding engine to do this, so linear, first and second-order conservative, nearest neighbour, etc. regridding methods are available.

Here is an example of the kind of regridding that you need:

import cf
import numpy

f = cf.example_field(2) # Use cf.read to read your own data

print('Source field:')
print(f)

# Define the output grid
lat = cf.DimensionCoordinate(
           data=cf.Data(numpy.arange(-90, 90.01, 0.083), 'degreesN'))
lon = cf.DimensionCoordinate(
          data=cf.Data(numpy.arange(0, 360, 0.083), 'degreesE'))

# Regrid the field
g = f.regrids({'latitude': lat, 'longitude': lon}, method='linear')

print('\nRegridded field:')
print(g)

which produces:

Source field:
Field: air_potential_temperature (ncvar%air_potential_temperature)
------------------------------------------------------------------
Data            : air_potential_temperature(time(36), latitude(5), longitude(8)) K
Cell methods    : area: mean
Dimension coords: time(36) = [1959-12-16 12:00:00, ..., 1962-11-16 00:00:00]
                : latitude(5) = [-75.0, ..., 75.0] degrees_north
                : longitude(8) = [22.5, ..., 337.5] degrees_east
                : air_pressure(1) = [850.0] hPa

Regridded field:
Field: air_potential_temperature (ncvar%air_potential_temperature)
------------------------------------------------------------------
Data            : air_potential_temperature(time(36), latitude(2169), longitude(4338)) K
Cell methods    : area: mean
Dimension coords: time(36) = [1959-12-16 12:00:00, ..., 1962-11-16 00:00:00]
                : latitude(2169) = [-90.0, ..., 89.94399999999655] degreesN
                : longitude(4338) = [0.0, ..., 359.971] degreesE
                : air_pressure(1) = [850.0] hPa

There are plenty of options to get the destination grid from other fields, as well as defining it explicitly. More details can be found in the documentation

cf-python will infer which axes are X and Y, etc from the CF metadata attached to the dataset, but if that is missing then there are always ways to manually set it or work around it.

dhassell
  • 341
  • 2
  • 5
5

A Python option, using CDO as a backend, is my package nctoolkit: https://nctoolkit.readthedocs.io/en/latest/, instalable via pip (https://pypi.org/project/nctoolkit/)

It has a built in method called to_latlon which will regrid to a specified latlon grid

In your case, you would need to do:

import nctoolkit as nc
data = nc.open_data(infile)
data.to_latlon(lon = [lon_min,lon_max],lat=[lat_min,lat_max], res =[0.083, 0.083])
ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
Robert Wilson
  • 3,192
  • 11
  • 19
4

Xarray uses something called 'lazy loading' to try and avoid using too much memory. Somewhere in your code, you are using a command which loads the entirety of the data into memory, which it cannot do. Instead, you should specify the calculation, then save the result directly to file. Xarray will perform the calculation a chunk at a time without loading everything into memory.

An example of regridding might look something like this:

da_input = open_dataarray(
    'input.nc') # the file the data will be loaded from
regrid_axis = np.arange(-90, 90, 0.125) # new coordinates
da_output = da_input.interp(lat=regrid_axis) # specify calculation
da_ouput.to_netcdf('output.nc') # save direct to file

Doing da_input.load(), or da_output.compute(), for example, would cause all the data to be loaded into memory - which you want to avoid.

Charles
  • 97
  • 5
  • This seems to depend strongly on `chunks=` which has default `None`; see [performance-of-chunking-in-xarray-dask-when-opening-and-re-chunking-a-dataset](https://stackoverflow.com/questions/58838873/performance-of-chunking-in-xarray-dask-when-opening-and-re-chunking-a-dataset) – denis Jul 20 '23 at 14:58
4

The easiest way to do this is to use operators like cdo and nco.

For example:

cdo remapbil,target_grid infile.nc ofile.nc

The target_grid can be a descriptor file or you can use a NetCDF file with your desired grid resolution. Take note of other regridding methods that might suit your need. The example above is using bilinear interpolation.

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
Dani56
  • 372
  • 1
  • 2
  • 7
2

Another way to access the cdo functionality from within python is to make use of the Pypi cdo project:

pip install cdo 

Then you can do

from cdo import Cdo
cdo=Cdo()
cdo.remapbil("target_grid",input="in.nc",output="out.nc")

where target_grid is your usual list of options

  1. a nc file to use the grid from
  2. a regular grid specifier e.g. r360x180
  3. a txt file with a grid descriptor (see below)

There are several methods built in for the regridding:

  • remapbic : bicubic interpolation
  • remapbil : bilinear interpolation
  • remapnn : nearest neighbour interpolation
  • remapcon : first order conservative remapping
  • remapcon2 : 2nd order conservative remapping

You can use a grid descriptor file to define the area you need to interpolate to...

in the file grid.txt

gridtype=lonlat
xfirst=X   (here X is the longitude of the left hand point)
xinc=0.083
xsize=NX   (here put the number of points in domain)
yfirst=Y
yinc=0.083
ysize=NY

For more details you can refer to my video guide on interpolation.

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
  • Can cdo regrid unstructured/irregular (with spatial resolution changing depending on the location) data? – Henrique Sep 21 '22 at 12:20
  • 1
    yes, cdo can handle a range of irregular and unstructured grids, for more details see here: https://code.mpimet.mpg.de/projects/cdo/embedded/index.html#x1-200001.5 – ClimateUnboxed Sep 21 '22 at 13:05