I have a xarray dataset that is:
ds
<xarray.Dataset>
Dimensions: (lat: 360, lon: 720, time: 3652)
Coordinates:
* lon (lon) float32 -179.75 -179.25 -178.75 -178.25 -177.75 -177.25 ...
* lat (lat) float32 89.75 89.25 88.75 88.25 87.75 87.25 86.75 86.25 ...
* time (time) datetime64[ns] 2010-01-01 2010-01-02 2010-01-03 ...
Data variables:
dis (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan...
There are nans in the dis variable but the whole array is not nans. The length of dimension time corresponds to 10 years of daily data (3652 days).
What I want to do is get monthly means of the 10 yr timeseries, for each month and each gridsquare (lat,lon). So output dataset would be:
Dimensions: (lat: 360, lon: 720, time: 12) #<<< or 'months'
One option I saw that almost does what I want is:
ds.dis.groupby('time.month').mean()
However the output of this is just an 12-item array. i.e. we lose both lat and lon dimensions.
<xarray.DataArray 'dis' (month: 12)>
array([ 368.26764123, 394.0543304 , 424.67056092, 476.94943773,
522.383195 , 516.37355647, 497.74700652, 472.46993274,
456.87268206, 402.44729131, 367.41928436, 362.6121917 ])
Coordinates:
* month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
I figure there are probably simple ways to do this using the datetime64 methods but I have struggled to make full sense of them.
Alas, whilst writing this I have managed by doing:
stacked = xr.concat([ds.dis[tlist[month,:],:,:].mean(dim='time',skipna=True) for month in range(0,12)],dim='month')
which gives:
<xarray.DataArray 'dis' (month: 12, lat: 360, lon: 720)>
However, is there another more pythonic way more in line with the first line of code using groupby?
Thanks