My goal is to calculate a quadratic fit over the time-dimension of 'zos' (sea level) data of piControl runs of various CMIP6 models and subtract that from other runs. In this particular example, the data looks as follows:
Out[108]:
<xarray.Dataset>
Dimensions: (bnds: 2, ncells: 830305, time: 3, vertices: 16)
Coordinates:
* time (time) object 2401-01-16 12:00:00 ... 2401-03-16 12:00:00
lon (ncells) float64 dask.array<chunksize=(5000,), meta=np.ndarray>
lat (ncells) float64 dask.array<chunksize=(5000,), meta=np.ndarray>
Dimensions without coordinates: bnds, ncells, vertices
Data variables:
time_bnds (time, bnds) object dask.array<chunksize=(3, 2), meta=np.ndarray>
lon_bnds (ncells, vertices) float64 dask.array<chunksize=(5000, 16), meta=np.ndarray>
lat_bnds (ncells, vertices) float64 dask.array<chunksize=(5000, 16), meta=np.ndarray>
zos (time, ncells) float32 dask.array<chunksize=(3, 5000), meta=np.ndarray>
I tried using xarray.Dataset.polyfit(), but this results in AttributeError: 'Dataset' object has no attribute 'polyfit'. Presumably this is because the CMIP6 models have irregular calendars and model years that are outside of the supported range (e.g., 'enable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range'). This is also why I'm using encode_times=False in the snippet below.
Alternatively, I've tried this solution:
from numpy.polynomial.polynomial import polyval,polyfit
import pandas as pd
import xarray as xr
import dask.array as da
import numpy as np
xr.set_options(display_style="html") # fancy HTML repr
mycmip6 = (
xr.open_dataset('/Volumes/Naamloos/PhD_Data/CMIP6/raw_mergetime/zos/AWI-CM-1-1-MR/zos_Omon_AWI-CM-1-1-MR_piControl_r1i1p1f1_gn_240101-290012.nc',decode_times=False,chunks={'ncells':5000})
.isel(time=slice(3)) #using a small slice for example
)
mycmip6
def fit_quadratic(data, time):
pfit = np.polyfit(time, data,2)
return np.transpose( np.polyval(pfit,time) )
pfit = xr.apply_ufunc(
fit_quadratic, # first the function
mycmip6.zos, #sea level data
mycmip6.time, # time
input_core_dims=[["time"], ["time"]], # list with one entry per arg
output_core_dims=[["time"]], # returned data has one dimension
vectorize=True, # loop over non-core dims
dask="parallelized",
output_dtypes=[mycmip6.zos.dtype],
)
pfit.compute()
I run into this error:
TypeError: wrong number of positional arguments: expected 2, got 3
When I'm not using chunking at all, I can use 'pfit', but I run into memory issues for the largest datasets.
What am I doing wrong here?