2

I have a nc file consisting of temperature data. I want to extract the temperature for a date range of May 30th to August 18th for the years 2001 to 2018. The time variable is in the following format 2001-01-23. I do not mind if it is in Python or cdo. My data overall looks like this:

<xarray.Dataset>
Dimensions:  (crs: 1, lat: 9, lon: 35, time: 6574)
Coordinates:
  * lat      (lat) float64 50.0 52.5 55.0 57.5 60.0 62.5 65.0 67.5 70.0
  * lon      (lon) float64 177.5 180.0 182.5 185.0 ... 255.0 257.5 260.0 262.5
  * crs      (crs) uint16 3
Dimensions without coordinates: time
Data variables:
    days     (time) datetime64[ns] 2001-01-01 2001-01-02 ... 2018-12-31
    tmax     (time, lat, lon) float32 ...

How can I for every year extract the date range mentioned above?

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
Thomas
  • 441
  • 3
  • 16

2 Answers2

2

You have to add your variables days as a coordinate with dataset.set_coords('days'). Then you can use sel to retrieve slices of your data

dataset.sel(time=slice("2001-01-23", "2018-01-01"))

Further Readings on xarray and Time Series

dl.meteo
  • 1,658
  • 15
  • 25
1

I typically find the best approach in these cases (where a simple range will not suffice) is to see if I can construct a boolean array with the same length as the time coordinate that is True if the value is a date I'd like to include in the selection, and False if it is not. Then I can pass this boolean array as an indexer in sel to get the selection I'd like.

For this example I would make use of the dayofyear, year, and is_leap_year attributes of the datetime accessor in xarray:

import pandas as pd

# Note dayofyear represents days since January first of the year,
# so it is offset by one after February 28/29th in leap years
# versus non-leap years.
may_30_leap = pd.Timestamp("2000-05-30").dayofyear
august_18_leap = pd.Timestamp("2000-08-18").dayofyear
range_leap = range(may_30_leap, august_18_leap + 1)

may_30_noleap = pd.Timestamp("2001-05-30").dayofyear
august_18_noleap = pd.Timestamp("2001-08-18").dayofyear
range_noleap = range(may_30_noleap, august_18_noleap + 1)

year_range = range(2001, 2019)

indexer = ((ds.days.dt.dayofyear.isin(range_leap) & ds.days.dt.is_leap_year) |
           (ds.days.dt.dayofyear.isin(range_noleap) & ~ds.days.dt.is_leap_year))
indexer = indexer & ds.days.dt.year.isin(year_range)

result = ds.sel(time=indexer)

The leap year logic is a bit clunky, but I can't think of a cleaner way.

spencerkclark
  • 1,869
  • 1
  • 12
  • 9
  • The issue with leap year can be bypassed by using `pd.date_range()` along with a list comprehension that loops through `year_range`. `date_range = [pd.date_range(start= f'{year}-05-30',end = f'{year}-08-18',freq='d') for year in year_range]` Then, flatten the list `date_range = [item for sublist in date_range for item in sublist]`. Lastly, `ds.sel(time=date_range)` – MorningGlory Jul 21 '21 at 16:01