I have a classic xarray Dataset. These are monthly data (38 years of monthly data).
I am interested in calculating the quantile values for each month separately.
<xarray.Dataset>
Dimensions: (lat: 26, lon: 71, time: 456)
Coordinates:
* lat (lat) float32 25.0 26.0 27.0 28.0 29.0 30.0 31.0 32.0 ...
* lon (lon) float32 -130.0 -129.0 -128.0 -127.0 -126.0 -125.0 ...
* time (time) datetime64[ns] 1979-01-31 1979-02-28 1979-03-31 ...
Data variables:
var1 (time, lat, lon) float32 nan nan nan nan nan nan nan nan ...
var2 (time, lat, lon) float32 nan nan nan nan nan nan nan nan ...
var3 (time, lat, lon) float32 nan nan nan nan nan nan nan nan ...
......
For example, if I want the mean for each month I use:
ds.groupby(‘time.month’).mean(dim=‘time’)
But if I try
ds.groupby(‘time.month’).quantile(0.75, dim=‘time’)
I get
AttributeError: 'DatasetGroupBy' object has no attribute 'quantile'
however, based on Pandas documentation, quantile works on groupby object.
In fact, I tried the following:
df_ds = xr.Dataset.to_dataframe(ds)
df_ds = df_ds.reset_index()
df_ds = df_ds.set_index('time')
df_ds.groupby(pd.TimeGrouper(freq='M')).quantile(0.75)
and it works; of course this is a much simpler example because I have only one index, and indeed if I don't reset_index/set_index to one index only I get an error from pandas that it cannot handle multiindex.
So, can xarray do it? perhaps using some apply/lambda combination?
I found a very non elegant way to go around it. It is feasible because I have 4 variables (and I could look through the variable names, but I don't here):
Data_clim_monthly_75g = ds.where(iok_conus_xarray).groupby('time.month')
Data_clim_monthly_75 = ds.where(iok_conus_xarray).groupby('time.month').mean(dim='time')
v1 = Data_clim_monthly_75['var1'].values
v2 = Data_clim_monthly_75['var2'].values
v3 = Data_clim_monthly_75['var3'].values
v4 = Data_clim_monthly_75['var4'].values
for k, gp in Data_clim_monthly_75g:
v1[k-1] = np.nanpercentile(gp['var1'].values,q=75,axis=0)
v2[k-1] = np.nanpercentile(gp['var2'].values,q=75,axis=0)
v3[k-1] = np.nanpercentile(gp['var3'].values,q=75,axis=0)
v4[k-1] = np.nanpercentile(gp['var4'].values,q=75,axis=0)
Data_clim_monthly_75['var1'] = (('month','lat','lon'),v1)
Data_clim_monthly_75['var2'] = (('month','lat','lon'),v2)
Data_clim_monthly_75['var3'] = (('month','lat','lon'),v3)
Data_clim_monthly_75['var4'] = (('month','lat','lon'),v4)
I basically work around xarray. I still would love a solution within xarray.