1

I have gridded daily temperature data but am only interested in the winter months.

from netCDF4 import Dataset as netcdf_dataset
import numpy as np
import xarray as xr

#open NASA GISS gridded temperature netcdf file
df = xr.open_dataset('BerkeleyEarth.nc')

#pull out temperature variable
air=df.temperature

#select only winter months
WinterAir = air[(air.time.dt.month >= 12) | (air.time.dt.month <= 2)]

When I try to select the months this way I get the following error message: AttributeError: 'DataArray' object has no attribute 'month'. How do I select only winter months?

Here is a screenshot of the netcdf file enter image description here

Megan Martin
  • 221
  • 1
  • 9
  • Can you just use df.month – lsr729 Feb 28 '22 at 17:03
  • But how would I select the df.temperature values that correspond with the correct df.month values? I'm new to xarray and python – Megan Martin Feb 28 '22 at 17:04
  • 1
    Does this answer your question? [How to select an inter-year period with xarray?](https://stackoverflow.com/questions/52533630/how-to-select-an-inter-year-period-with-xarray) – ClimateUnboxed Feb 28 '22 at 21:48
  • Note that this is not an exact duplicate of the linked question. In this question's case, the data is shaped oddly and does not have a datetime coordinate. The OP is asking about how to access the DatetimeAccessor attributes on a dataset without a time coordinates. Answer: you can't. – Michael Delgado Mar 01 '22 at 17:56

2 Answers2

1

I was able to do this by:

# select only winter months

WinterAir = air[(df.month >= 12) | (df.month <= 2)]
ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
Megan Martin
  • 221
  • 1
  • 9
1

The reason this doesn't work for your data specifically is that you don't have a datetime coordinate time; instead, you have a dimension time without any coordinate data labeling it, and then you have data variables with a variety of date components. Because of this, you can reference the month data variable directly and use that to slice your data.

You could always construct a datetime coordinate using the day, month, and year values in your data and assign that as the time coordinate, and then the usual time series functionality built into xarray would work.

As an example, here's a dataset similar to yours in structure:

In [6]: dates = pd.date_range("2020-01-01", "2020-12-31", freq="D")
   ...:
   ...: ds = xr.Dataset(
   ...:     coords={"lon": [-135, -45, 45, 135], "lat": [-45, 45]},
   ...:     data_vars={
   ...:         "day": (("time",), dates.day),
   ...:         "month": (("time",), dates.month),
   ...:         "year": (("time",), dates.year),
   ...:         "temperature": (
   ...:             ("lat", "lon", "time"),
   ...:             np.random.random(size=(2, 4, len(dates))),
   ...:         ),
   ...:     },
   ...: )

In [7]: ds
Out[7]:
<xarray.Dataset>
Dimensions:      (time: 366, lat: 2, lon: 4)
Coordinates:
  * lon          (lon) int64 -135 -45 45 135
  * lat          (lat) int64 -45 45
Dimensions without coordinates: time
Data variables:
    day          (time) int64 1 2 3 4 5 6 7 8 9 ... 23 24 25 26 27 28 29 30 31
    month        (time) int64 1 1 1 1 1 1 1 1 1 1 ... 12 12 12 12 12 12 12 12 12
    year         (time) int64 2020 2020 2020 2020 2020 ... 2020 2020 2020 2020
    temperature  (lat, lon, time) float64 0.2308 0.3257 ... 0.3501 0.009162

Note that time is a special "dimension without coordinates" - this means that there are no labels on the time dimension, and xarray does not know anything about "time" except that it has a certain shape and is the dimension indexing several of your data variables. Importantly, in your data, time is not a datetime type.

Because month is a data variable in the dataset, you need to reference month directly, as you found, and the DatetimeAccessor ds.time.dt is not available:

In [8]: ds.loc[{"time": ds.month == 2}]
Out[8]:
<xarray.Dataset>
Dimensions:      (time: 29, lat: 2, lon: 4)
Coordinates:
  * lon          (lon) int64 -135 -45 45 135
  * lat          (lat) int64 -45 45
Dimensions without coordinates: time
Data variables:
    day          (time) int64 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29
    month        (time) int64 2 2 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2 2
    year         (time) int64 2020 2020 2020 2020 2020 ... 2020 2020 2020 2020
    temperature  (lat, lon, time) float64 0.2821 0.08776 0.2018 ... 0.929 0.4774

If the time dimension had a corresponding coordinate of type datetime, e.g. by assigning the previous dates array to the time coord, everything would work as you expect:

In [10]: dates = pd.date_range("2020-01-01", "2020-12-31", freq="D")
    ...:
    ...: ds = xr.Dataset(
    ...:     coords={"lon": [-135, -45, 45, 135], "lat": [-45, 45], "time": dates},
    ...:     data_vars={
    ...:         "temperature": (
    ...:             ("lat", "lon", "time"),
    ...:             np.random.random(size=(2, 4, len(dates))),
    ...:         ),
    ...:     },
    ...: )

In [11]: ds
Out[11]:
<xarray.Dataset>
Dimensions:      (lat: 2, lon: 4, time: 366)
Coordinates:
  * lon          (lon) int64 -135 -45 45 135
  * lat          (lat) int64 -45 45
  * time         (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2020-12-31
Data variables:
    temperature  (lat, lon, time) float64 0.09064 0.5252 ... 0.08733 0.6283

Now the xarray datetime accessors work the way you'd expect

In [12]: ds.loc[{"time": ds.time.dt.month == 2}]
Out[12]:
<xarray.Dataset>
Dimensions:      (lat: 2, lon: 4, time: 29)
Coordinates:
  * lon          (lon) int64 -135 -45 45 135
  * lat          (lat) int64 -45 45
  * time         (time) datetime64[ns] 2020-02-01 2020-02-02 ... 2020-02-29
Data variables:
    temperature  (lat, lon, time) float64 0.3407 0.6847 0.3073 ... 0.8578 0.1335

See xarray's docs on Coordinates and working with time series data for more info.

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
  • Thank you! I am now trying to create a new xarray dataset with time as a coordinate. Here's what i have: `latitude=df.latitude longitude=df.longitude temperature=df.temperature dates = pd.date_range("1950-01-01", "2021-12-31", freq="D") NewBerkeley = xr.Dataset( coords={"lon": longitude, "lat": latitude, "time": dates}, data_vars={"temperature":(( "lat", "lon","time"),temperature,),},)` but get the following error "ValueError: conflicting sizes for dimension 'time': length 26298 on 'time' and length 360 on 'temperature'" – Megan Martin Mar 01 '22 at 20:00
  • Conflicting sizes for dimension 'time': length 26298 on 'time' and length 360 on 'temperature' <-- it's telling you exactly what it sounds like. you need to make sure the `time` dim on temperature is the same length as the coordinate you're trying to assign. – Michael Delgado Mar 01 '22 at 20:56
  • oh - make sure that the dimensions of temperature are actually `(lat, lon, time)` and not some other ordering? – Michael Delgado Mar 01 '22 at 20:58