0

I have data that is on a monthly timeseries, and I want to resample it in a specific way such that I have two seasonal groups - 5 months (DJFMA) and 7 months (MJJASON) and find the maximum value for each gridpoint from each group. Here is what I have, but obviously it does not do what I want:

my_data.resample(time='2QS-NOV').max(dim='time')

Thank you!

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Feb 25 '22 at 14:29
  • Hi there! I edited your question title a bit to clarify that it's about resampling to variable-length frequency groups rather than about chunking. if you were wondering about how to do this while maintaining chunks along the time dimension, then my bad - please feel free to edit the question to include the current and desired chunking scheme, memory requirements, etc. – Michael Delgado Feb 25 '22 at 18:34

1 Answers1

2

you can use the DateTimeAccessor attribute of any datetime coordinate to define your own grouper, then use groupby instead of resample to work with custom resampling frequencies.

As an example, I'll set up a dummy dataset with 4 years of daily data

In [1]: import pandas as pd, xarray as xr, numpy as np

In [2]: da = xr.DataArray(
   ...:     np.arange(365 * 4 + 1),
   ...:     dims=["time"],
   ...:     coords=[pd.date_range("2020-01-01", freq="D", periods=(365 * 4 + 1))],
   ...: )

In [3]: da
Out[3]:
<xarray.DataArray (time: 1461)>
array([   0,    1,    2, ..., 1458, 1459, 1460])
Coordinates:
  * time     (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2023-12-31

You can access the month as an integer from 1 to 12 using the .dt.month accessor:

In [4]: da.time.dt.month
Out[4]:
<xarray.DataArray 'month' (time: 1461)>
array([ 1,  1,  1, ..., 12, 12, 12])
Coordinates:
  * time     (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2023-12-31

You can use this as its own series to build any conditions you want:

In [5]: (da.time.dt.month > 4) & (da.time.dt.month < 12)
Out[5]:
<xarray.DataArray 'month' (time: 1461)>
array([False, False, False, ..., False, False, False])
Coordinates:
  * time     (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2023-12-31

I'll build on this to make a string of the format YYYY-{monthgroup}, making sure to include December in the next year's group:

In [13]: grouper = (
    ...:     xr.where(da.time.dt.month == 12, (da.time.dt.year + 1), da.time.dt.year)
    ...:     .astype(str)
    ...:     .astype("O")
    ...:     + "-"
    ...:     + xr.where((da.time.dt.month > 4) & (da.time.dt.month < 12), "MJJASON", "DJFMA")
    ...: )

We can use this grouper to resample the data along the time dimension:

In [14]: da.groupby(grouper).max(dim="time").sortby("group")
Out[14]:
<xarray.DataArray (group: 9)>
array([ 120,  334,  485,  699,  850, 1064, 1215, 1429, 1460])
Coordinates:
  * group    (group) object '2020-DJFMA' '2020-MJJASON' ... '2024-DJFMA'

Note that the first and last groups are missing months because the data doesn't align cleanly with the December through November seasonal scheme. You may want to drop these depending on your goals.

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54