1

I'm currently working with daily climate data (ERA5) and am analysing it with xarray.

da

<xarray.DataArray (time: 8036)>
dask.array<stack, shape=(8036,), dtype=float32, chunksize=(1,), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 1979-01-01 1979-01-02 ... 2000-12-31

I want to calculate the daily anomaly of this data from the local monthly mean of the time-series. That is, I want to take away the average of (eg) January 1979 from all the days in January 1979. And I'd like to do this for every month of every year in my array.

I don't think there is a simple way to do this with xarray at the moment, but I'd appreciate any workarounds people can find! :)

Andrew Williams
  • 741
  • 8
  • 18

2 Answers2

2

A slightly niftier method, which may give you an idea for how to extend things to other kinds of anomalies is:

da = da.assign_coords(year_month=da.time.dt.strftime("%Y-%m"))
result = da.groupby("year_month") - da.groupby("year_month").mean("time")
spencerkclark
  • 1,869
  • 1
  • 12
  • 9
  • Oooh, this looks much nicer. I'll try this later today and let you know :) thanks – Andrew Williams Apr 24 '20 at 09:45
  • Works like a charm! I had no idea the `.dt` accessor existed but this makes things a lot easier! thanks again – Andrew Williams Apr 24 '20 at 10:29
  • One quick question, if you wanted to extend something like this to weekly anomalies for example, you could use something like `.strftime("%Y-%M-%V")` instead? – Andrew Williams Apr 24 '20 at 10:39
  • 1
    Sure thing! For weekly anomalies I think something like `.strftime("%Y-%W")` would be sufficient, since `"%W"` denotes the "week number of the year." But, yeah, I think you get the idea! The key is in generating the groupby coordinate. Sometimes `strftime` is good for that; sometimes you need to be a little more creative. – spencerkclark Apr 24 '20 at 11:08
0

Ok I think I've found an answer if anyone's interested, I'll leave the question open for a while though because this is a very ugly solution in my opinion and it'd be nice if there was something better than what I can cobble together!

da_cp = da.copy().load()

for year in ['1979', '1980', '1981', ..., '1998', '1999', '2000']:
    for month in ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12']:
        da_cp.loc[f'{year}-{month}'] -= da_cp.loc[f'{year}-{month}'].mean('time')

I've tested this numerically and it gives the correct answer, however it's not very flexible at the moment and I'd like to see this extended to 'local seasonal anomalies' as well.

Andrew Williams
  • 741
  • 8
  • 18