3

Say I have an xarray.Dataset object loaded in using xarray.open_dataset(..., decode_times=False) that looks like this when printed:

<xarray.Dataset>
Dimensions:    (bnds: 2, lat: 15, lon: 34, plev: 8, time: 3650)
Coordinates:
  * time       (time) float64 3.322e+04 3.322e+04 3.322e+04 3.322e+04 ...
  * plev       (plev) float64 1e+05 8.5e+04 7e+04 5e+04 2.5e+04 1e+04 5e+03 ...
  * lat        (lat) float64 40.46 43.25 46.04 48.84 51.63 54.42 57.21 60.0 ...
  * lon        (lon) float64 216.6 219.4 222.2 225.0 227.8 230.6 233.4 236.2 ...
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) float64 3.322e+04 3.322e+04 3.322e+04 3.322e+04 ...
    lat_bnds   (lat, bnds) float64 39.07 41.86 41.86 44.65 44.65 47.44 47.44 ...
    lon_bnds   (lon, bnds) float64 215.2 218.0 218.0 220.8 220.8 223.6 223.6 ...
    hus        (time, plev, lat, lon) float64 0.006508 0.007438 0.008751 ...

What would be the best way to subset this given multiple ranges for lat, lon, and time? I've tried chaining a series of conditions and used xarray.Dataset.where, but I get an error saying:

IndexError: The indexing operation you are attempting to perform is not valid on netCDF4.Variable object. Try loading your data into memory first by calling .load().

I can't load the entire dataset into memory, so what would be the typical way to do this?

pbreach
  • 16,049
  • 27
  • 82
  • 120

1 Answers1

3

NetCDF4 doesn't support all of the multi-dimensional indexing operations supported by NumPy. But does support slicing (which is very fast) and one dimensional indexing (somewhat slower).

Some things to try:

  • Index with slices (e.g., .sel(time=slice(start, end))) before indexing with 1-dimensional arrays. This should offload the array-based indexing from netCDF4 to Dask/NumPy.
  • Split up your indexing operations into more intermediate operations that index along fewer dimensions at once. It sounds like you've already tried this one, but maybe it's worth exploring a little more.
  • To optimize performance, try different Dask chunking schemes using the .chunk().

If that doesn't work, post a full self-contained example to the xarray issue tracker on GitHub and we can take a look into it in more detail.

shoyer
  • 9,165
  • 1
  • 37
  • 55
  • Thanks for this! I ended up using `.sel()` (didn't realize it could take slices for some reason). The only issue that I could see coming up is dealing with cylindrical coordinates for longitude. What would you suggest for this? Maybe subsetting longitude using `.where()` after `.sel()`? – pbreach Mar 01 '17 at 18:34
  • See this answer for a few ideas: http://gis.stackexchange.com/questions/205871/xarray-slicing-across-the-antimeridian – shoyer Mar 03 '17 at 02:42
  • Ahhhh okay didn't know I could do that with `.sel()` I really need to start reading the source code and docs better. Really great package btw! – pbreach Mar 03 '17 at 03:29