0

I've downloaded a netcdf from the Climate Data Store and would like to write a CRS to it, so I can clip it for a shapefile. However, I get an error when assigning a CRS. Below my script and what is being printed. I receive this error after trying to write a crs:

MissingSpatialDimensionError: y dimension (lat) not found. Data variable: lon_bnds

# load netcdf with xarray
dset = xr.open_dataset(netcdf_fn)
print(dset)

# add projection system to nc
dset = dset.rio.write_crs("EPSG:4326", inplace=True)

# mask CMIP6 data with shapefile
dset_shp = dset.rio.clip(shp.geometry.apply(mapping), shp.crs)

dset
Out[44]: 
<xarray.Dataset>
Dimensions:      (time: 1825, bnds: 2, lat: 2, lon: 1)
Coordinates:
  * time         (time) object 2021-01-01 12:00:00 ... 2025-12-31 12:00:00
  * lat          (lat) float64 0.4712 1.414
  * lon          (lon) float64 31.25
    spatial_ref  int32 0
Dimensions without coordinates: bnds
Data variables:
    time_bnds    (time, bnds) object ...
    lat_bnds     (lat, bnds) float64 0.0 0.9424 0.9424 1.885
    lon_bnds     (lon, bnds) float64 ...
    pr           (time, lat, lon) float32 ...
Attributes: (12/48)
    Conventions:            CF-1.7 CMIP-6.2
    activity_id:            ScenarioMIP
    branch_method:          standard
    branch_time_in_child:   60225.0
    branch_time_in_parent:  60225.0
    comment:                none
                    ...
    title:                  CMCC-ESM2 output prepared for CMIP6
    variable_id:            pr
    variant_label:          r1i1p1f1
    license:                CMIP6 model data produced by CMCC is licensed und...
    cmor_version:           3.6.0
    tracking_id:            hdl:21.14100/0c6732f7-2cdd-4296-99a0-7952b7ca911e
CrossLord
  • 574
  • 4
  • 20

1 Answers1

1

When you call the rioxarray accessor ds.rio.clip using a xr.Dataset rather than a xr.DataArray, rioxarray needs to guess which variables in the dataset should be clipped. The method docstring gives the following warning:

Warning:

Clips variables that have dimensions ‘x’/’y’. Others are appended as is.

So the issue you're running into is that rioxarray sees four variables in your dataset:

Data variables:
    time_bnds    (time, bnds) object ...
    lat_bnds     (lat, bnds) float64 0.0 0.9424 0.9424 1.885
    lon_bnds     (lon, bnds) float64 ...
    pr           (time, lat, lon) float32 ...

Of these, lat_bnds, lon_bnds, and pr all have x or y dimensions which could conceivably be clipped. Rather than making some arbitrary choice about what to do in this situation, rioxarray is raising an error with the message MissingSpatialDimensionError: y dimension (lat) not found. Data variable: lon_bnds. This indicates that when processing the variable lon_bnds, it's not sure what to do, because it can find an x dimension but not a y dimension.

To address this, you have two options. The first is to call clip on the pr array only. This is probably the right call - generally I'd recommend only doing data processing with Arrays (not Datasets) whenever possible unless you really know you want to map an operation across all variables in the dataset. Calling clip on pr would look like this:

clipped = dset.pr.rio.clip(shp.geometry.apply(mapping), shp.crs)

Alternatively, you could resolve the issue of having data_variables that really should be coordinates. You can use the method set_coordsto reclassify the non-data data_variables as non-dimension coordinates. In this case:

dset  = dset.set_coords(['time_bnds', 'lat_bnds', 'lon_bnds'])

I'm not sure if this will completely resolve your issue - it's possible that rioxarray will still raise this error when processing coordinates. You could always drop the bounds, too. But the first method of only calling this on a single variable will work.

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54