Fitting a polynomial to chunks of a large CMIP6 NetCDF xarray?

Question

My goal is to calculate a quadratic fit over the time-dimension of 'zos' (sea level) data of piControl runs of various CMIP6 models and subtract that from other runs. In this particular example, the data looks as follows:

Out[108]: 
<xarray.Dataset>
Dimensions:    (bnds: 2, ncells: 830305, time: 3, vertices: 16)
Coordinates:
  * time       (time) object 2401-01-16 12:00:00 ... 2401-03-16 12:00:00
    lon        (ncells) float64 dask.array<chunksize=(5000,), meta=np.ndarray>
    lat        (ncells) float64 dask.array<chunksize=(5000,), meta=np.ndarray>
Dimensions without coordinates: bnds, ncells, vertices
Data variables:
    time_bnds  (time, bnds) object dask.array<chunksize=(3, 2), meta=np.ndarray>
    lon_bnds   (ncells, vertices) float64 dask.array<chunksize=(5000, 16), meta=np.ndarray>
    lat_bnds   (ncells, vertices) float64 dask.array<chunksize=(5000, 16), meta=np.ndarray>
    zos        (time, ncells) float32 dask.array<chunksize=(3, 5000), meta=np.ndarray>

I tried using xarray.Dataset.polyfit(), but this results in AttributeError: 'Dataset' object has no attribute 'polyfit'. Presumably this is because the CMIP6 models have irregular calendars and model years that are outside of the supported range (e.g., 'enable to decode time axis into full numpy.datetime64 objects, continuing using cftime.datetime objects instead, reason: dates out of range'). This is also why I'm using encode_times=False in the snippet below.

Alternatively, I've tried this solution:

from numpy.polynomial.polynomial import polyval,polyfit
import pandas as pd
import xarray as xr
import dask.array as da
import numpy as np

xr.set_options(display_style="html")  # fancy HTML repr

mycmip6 = (
    xr.open_dataset('/Volumes/Naamloos/PhD_Data/CMIP6/raw_mergetime/zos/AWI-CM-1-1-MR/zos_Omon_AWI-CM-1-1-MR_piControl_r1i1p1f1_gn_240101-290012.nc',decode_times=False,chunks={'ncells':5000})
    .isel(time=slice(3)) #using a small slice for example
    )
mycmip6

def fit_quadratic(data, time):
    pfit = np.polyfit(time, data,2) 
    
    return np.transpose( np.polyval(pfit,time) )


pfit = xr.apply_ufunc(
    fit_quadratic,  # first the function
    mycmip6.zos, #sea level data
    mycmip6.time,  # time
    input_core_dims=[["time"], ["time"]],  # list with one entry per arg
    output_core_dims=[["time"]],  # returned data has one dimension
    vectorize=True,  # loop over non-core dims
    dask="parallelized",
    output_dtypes=[mycmip6.zos.dtype],
)

This works fine, but as soon as I try to manipulate or save pfit to netcdf,

pfit.compute()

I run into this error:

TypeError: wrong number of positional arguments: expected 2, got 3

When I'm not using chunking at all, I can use 'pfit', but I run into memory issues for the largest datasets.

What am I doing wrong here?

FYI `polyfit` was only recently merged and has not made into an xarray release yet (as of May 2nd, 2020), so that's my guess as to why you're getting an `AttributeError`. Once a new release of xarray is made, however, `polyfit` will operate on Dataset objects and will work with `cftime.datetime` coordinates! — spencerkclark, May 02 '20 at 22:45
Thanks. In the meantime, I discovered things work for me when I change dask="parallelized" to dask="allowed". — Tim Hermans, May 13 '20 at 09:40

Fitting a polynomial to chunks of a large CMIP6 NetCDF xarray?

0 Answers0