I have a gridded xarray dataset that looks like this:
print(ds)
<xarray.Dataset>
Dimensions: (month: 12, isobaricInhPa: 37, latitude: 721, longitude: 1440)
Coordinates:
* isobaricInhPa (isobaricInhPa) float64 1e+03 975.0 950.0 ... 3.0 2.0 1.0
* latitude (latitude) float64 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
* longitude (longitude) float64 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
* month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
h (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 405, 720), meta=np.ndarray>
speed (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 405, 720), meta=np.ndarray>
direction (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 405, 720), meta=np.ndarray>
From this dataset, I extracted data for a series of points using:
ds_volc = ds.sel(latitude=volc['Latitude'].values,longitude=volc['Longitude'].values, method='nearest', drop=True)
where volc
is a Pandas DataFrame and volc['Latitude'].values
and volc['Longitude'].values
are vectors of lat/lon pairs for my 1431 points of interests.
print(ds_volc)
<xarray.Dataset>
Dimensions: (month: 12, isobaricInhPa: 37, latitude: 1431, longitude: 1431)
Coordinates:
* isobaricInhPa (isobaricInhPa) float64 1e+03 975.0 950.0 ... 3.0 2.0 1.0
* latitude (latitude) float64 50.25 45.75 42.25 ... -56.0 -64.25 -62.0
* longitude (longitude) float64 6.75 3.0 2.5 0.0 ... 0.0 0.0 0.0 0.0
* month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
h (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 190, 1431), meta=np.ndarray>
speed (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 190, 1431), meta=np.ndarray>
direction (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 190, 1431), meta=np.ndarray>
Rather than using pairs of lat/lon coordinates, I need to add the ID of my points (called Volcano Number
) as a new coordinate:
ds_volc = ds_volc.assign_coords({'Volcano Number':volc['Volcano Number'].values})
which results in:
<xarray.Dataset>
Dimensions: (month: 12, isobaricInhPa: 37, latitude: 1431, longitude: 1431, Volcano Number: 1431)
Coordinates:
* isobaricInhPa (isobaricInhPa) float64 1e+03 975.0 950.0 ... 3.0 2.0 1.0
* latitude (latitude) float64 50.25 45.75 42.25 ... -56.0 -64.25 -62.0
* longitude (longitude) float64 6.75 3.0 2.5 0.0 ... 0.0 0.0 0.0 0.0
* month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
* Volcano Number (Volcano Number) int64 210010 210020 ... 390829 390847
Data variables:
h (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 190, 1431), meta=np.ndarray>
speed (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 190, 1431), meta=np.ndarray>
direction (month, isobaricInhPa, latitude, longitude) float32 dask.array<chunksize=(1, 19, 190, 1431), meta=np.ndarray>
The question
Since I added the Volcano Number
coordinate, the latitude
and longitude
coordinates (and dimensions) become obsolete and I need to reorganise the dimensions of the variables. I thought I could simply use ds_volc.drop_dims(['latitude', 'longitude'])
, but that drops the associated variables. I have also tried ds_volc.sel({'Volcano Number': 210010}, drop=True)
, but that returns a xarray.Dataset with 1431 longitude and 1431 latitude points.
Therefore, is there a way to somehow reshape the variables so their dimensions become (month: 12, isobaricInhPa: 37, Volcano Number: 1431)
instead of (month, isobaricInhPa, latitude, longitude)
, so as to return a xarray.Dataset of dimension (month: 12, isobaricInhPa: 37)
when I query with ds_volc.sel({'Volcano Number': 210010}, drop=True)
?