1

A have a netCDF file that I opened with xarray. I want to clip the xarray data set using a shapefile; however, I cannot figure out how to properly set my spatial dimensions.

I have the following data set:

print(ds.keys())

Dimensions:                          (sample: 86401, ddm: 4, delay: 17,
                                      doppler: 11)
Coordinates:
  * sample                           (sample) int32 0 1 2 ... 86398 86399 86400
  * ddm                              (ddm) int8 0 1 2 3
    ddm_timestamp_utc                (sample) datetime64[ns] ...
    sp_lat                           (sample, ddm) float32 ...
    sp_lon                           (sample, ddm) float32 ...
Dimensions without coordinates: delay, doppler
Data variables: (12/126)
    spacecraft_id                    int16 ...
    spacecraft_num                   int8 ...
    ddm_source                       int8 ...
    ddm_time_type_selector           int8 ...
    delay_resolution                 float32 ...

And from: print(ds.dims)

Frozen({'sample': 86401, 'ddm': 4, 'delay': 17, 'doppler': 11})

I have tried expanding the dimensions to include sp_lat and sp_lon with

ds.expand_dims(['x', 'y']
ds.rename_vars({'sp_lon': 'x', 'sp_lat': 'x'})

I have also tried

ds.rename({'sp_lon': 'x', 'sp_lat': 'y'})
ds.rio.set_spatial_dims('x', 'y', inplace=True)

And I even tried multi-indexing. How can I use my coordinates as my spatial dimensions so that I can clip my dataset with ds.rio.clip(...)

I am using xarray and rioxarray with python 3.9.

denis
  • 21,378
  • 10
  • 65
  • 88
  • when asking about xarray data objects, you can just `print(ds)` to give us a good sense of what the array structure is. That said, a [mre] would be very much appreciated! – Michael Delgado Apr 15 '22 at 16:36

1 Answers1

2

rioxarray expects your data to be on a regular grid. Your data appears to be observational data, with a lat, lon value given for the position (of a given spacecraft maybe?) at that time step. You will not be able to expand the dimensions of your array to include (lat, lon).

Instead, you can build an array indicating whether a given (sample, ddm) point is contained in a shapefile using the arrays sp_lat, sp_lon directly.

Example assignment to countries

For example, if you have a shapefile of countries:

In [7]: countries = gpd.read_file('https://naturalearth.s3.amazonaws.com/110m_cultural/ne_110m_admin_0_countries.zip')

and the following dataset:

In [13]: sample = np.arange(100)
    ...: ddm = np.arange(4)
    ...: timestep = pd.date_range('2020-01-01', periods=len(sample), freq='H')
    ...: sp_lat = np.random.random(size=(len(sample), len(ddm))) * 180 - 90
    ...: sp_lon = np.random.random(size=(len(sample), len(ddm))) * 360 - 180
    ...:
    ...: ds = xr.Dataset(
    ...:     {},
    ...:     coords={
    ...:         'sample': sample,
    ...:         'ddm': ddm,
    ...:         'ddm_timestamp_utc': (('sample', ), timestep),
    ...:         'sp_lat': (('sample', 'ddm'), sp_lat),
    ...:         'sp_lon': (('sample', 'ddm'), sp_lon),
    ...:     },
    ...: )

In [14]: ds
Out[14]:
<xarray.Dataset>
Dimensions:            (sample: 100, ddm: 4)
Coordinates:
  * sample             (sample) int64 0 1 2 3 4 5 6 7 ... 93 94 95 96 97 98 99
  * ddm                (ddm) int64 0 1 2 3
    ddm_timestamp_utc  (sample) datetime64[ns] 2020-01-01 ... 2020-01-05T03:0...
    sp_lat             (sample, ddm) float64 28.11 -88.63 15.52 ... 70.92 -51.87
    sp_lon             (sample, ddm) float64 -46.9 132.9 ... -70.12 -161.3
Data variables:
    *empty*

You can convert your x, y observations into a geopandas GeoDataFrame:

In [15]: x_flat = ds.sp_lon.values.ravel()
    ...: y_flat = ds.sp_lat.values.ravel()

In [19]: xy_point_array = gpd.GeoDataFrame(
    ...:     geometry=gpd.points_from_xy(x_flat, y_flat, crs='epsg:4326')
    ...: )

Then, use sjoin to assign all countries in the shapefile to a point (returning NaN for points not falling within a country):

In [20]: countries_by_point = xy_point_array.sjoin(countries, how='left')

The result can then be reshaped into the dimensions of your point arrays and returned to xarray:

In [24]: ds.coords['country'] = (
    ...:     ('sample', 'ddm'),
    ...:     countries_by_point.ADM0_A3.values.reshape(sp_lat.shape),
    ...: )

In [25]: ds
Out[25]:
<xarray.Dataset>
Dimensions:            (sample: 100, ddm: 4)
Coordinates:
  * sample             (sample) int64 0 1 2 3 4 5 6 7 ... 93 94 95 96 97 98 99
  * ddm                (ddm) int64 0 1 2 3
    ddm_timestamp_utc  (sample) datetime64[ns] 2020-01-01 ... 2020-01-05T03:0...
    sp_lat             (sample, ddm) float64 28.11 -88.63 15.52 ... 70.92 -51.87
    sp_lon             (sample, ddm) float64 -46.9 132.9 ... -70.12 -161.3
    country            (sample, ddm) object nan 'ATA' nan nan ... 'ATA' nan nan
Data variables:
    *empty*
Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
  • Thank you for the response. I did a similar approach here by pulling the sp_lat, sp_lon, and the other columns I needed into a geopandas dataframe. Also this data was for a satellite (CYGNSS), I was frustrated that I could not use the coordinates. – TylerSingleton Apr 15 '22 at 18:23
  • 1
    yeah. you totally would be able to for a gridded dataset, but your data just doens't match the data model rioxarray is targeting (rasters). we run into this all the time parsing hurriane tracks :) – Michael Delgado Apr 15 '22 at 18:47