The reason this doesn't work for your data specifically is that you don't have a datetime
coordinate time
; instead, you have a dimension time
without any coordinate data labeling it, and then you have data variables with a variety of date components. Because of this, you can reference the month
data variable directly and use that to slice your data.
You could always construct a datetime coordinate using the day, month, and year values in your data and assign that as the time
coordinate, and then the usual time series functionality built into xarray would work.
As an example, here's a dataset similar to yours in structure:
In [6]: dates = pd.date_range("2020-01-01", "2020-12-31", freq="D")
...:
...: ds = xr.Dataset(
...: coords={"lon": [-135, -45, 45, 135], "lat": [-45, 45]},
...: data_vars={
...: "day": (("time",), dates.day),
...: "month": (("time",), dates.month),
...: "year": (("time",), dates.year),
...: "temperature": (
...: ("lat", "lon", "time"),
...: np.random.random(size=(2, 4, len(dates))),
...: ),
...: },
...: )
In [7]: ds
Out[7]:
<xarray.Dataset>
Dimensions: (time: 366, lat: 2, lon: 4)
Coordinates:
* lon (lon) int64 -135 -45 45 135
* lat (lat) int64 -45 45
Dimensions without coordinates: time
Data variables:
day (time) int64 1 2 3 4 5 6 7 8 9 ... 23 24 25 26 27 28 29 30 31
month (time) int64 1 1 1 1 1 1 1 1 1 1 ... 12 12 12 12 12 12 12 12 12
year (time) int64 2020 2020 2020 2020 2020 ... 2020 2020 2020 2020
temperature (lat, lon, time) float64 0.2308 0.3257 ... 0.3501 0.009162
Note that time
is a special "dimension without coordinates" - this means that there are no labels on the time dimension, and xarray does not know anything about "time" except that it has a certain shape and is the dimension indexing several of your data variables. Importantly, in your data, time
is not a datetime type.
Because month
is a data variable in the dataset, you need to reference month directly, as you found, and the DatetimeAccessor ds.time.dt
is not available:
In [8]: ds.loc[{"time": ds.month == 2}]
Out[8]:
<xarray.Dataset>
Dimensions: (time: 29, lat: 2, lon: 4)
Coordinates:
* lon (lon) int64 -135 -45 45 135
* lat (lat) int64 -45 45
Dimensions without coordinates: time
Data variables:
day (time) int64 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29
month (time) int64 2 2 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2 2
year (time) int64 2020 2020 2020 2020 2020 ... 2020 2020 2020 2020
temperature (lat, lon, time) float64 0.2821 0.08776 0.2018 ... 0.929 0.4774
If the time
dimension had a corresponding coordinate of type datetime
, e.g. by assigning the previous dates
array to the time
coord, everything would work as you expect:
In [10]: dates = pd.date_range("2020-01-01", "2020-12-31", freq="D")
...:
...: ds = xr.Dataset(
...: coords={"lon": [-135, -45, 45, 135], "lat": [-45, 45], "time": dates},
...: data_vars={
...: "temperature": (
...: ("lat", "lon", "time"),
...: np.random.random(size=(2, 4, len(dates))),
...: ),
...: },
...: )
In [11]: ds
Out[11]:
<xarray.Dataset>
Dimensions: (lat: 2, lon: 4, time: 366)
Coordinates:
* lon (lon) int64 -135 -45 45 135
* lat (lat) int64 -45 45
* time (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2020-12-31
Data variables:
temperature (lat, lon, time) float64 0.09064 0.5252 ... 0.08733 0.6283
Now the xarray datetime accessors work the way you'd expect
In [12]: ds.loc[{"time": ds.time.dt.month == 2}]
Out[12]:
<xarray.Dataset>
Dimensions: (lat: 2, lon: 4, time: 29)
Coordinates:
* lon (lon) int64 -135 -45 45 135
* lat (lat) int64 -45 45
* time (time) datetime64[ns] 2020-02-01 2020-02-02 ... 2020-02-29
Data variables:
temperature (lat, lon, time) float64 0.3407 0.6847 0.3073 ... 0.8578 0.1335
See xarray's docs on Coordinates and working with time series data for more info.