xarray now has a polyfit method (see xr.DataArray.polyfit
and xr.Dataset.polyfit
). It integrates well with dask and performs reasonably well even for large arrays.
Example
In [2]: # generate a large random dask array
...: arr = dda.random.random((100, 1000, 1000), chunks=(-1, 100, 100)).astype('float32')
In [3]: da = xr.DataArray(arr, dims=['time', 'y', 'x'], coords=[pd.date_range('1950-01-01', freq='Y', periods=100), np.a
...: range(1000), np.arange(1000)])
In [4]: da
Out[4]:
<xarray.DataArray 'astype-5e6d16cb0909caa0098a5c92f0770801' (time: 100,
y: 1000, x: 1000)>
dask.array<astype, shape=(100, 1000, 1000), dtype=float32, chunksize=(100, 100, 100), chunktype=numpy.ndarray>
Coordinates:
* time (time) datetime64[ns] 1950-12-31 1951-12-31 ... 2049-12-31
* y (y) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
* x (x) int64 0 1 2 3 4 5 6 7 8 ... 991 992 993 994 995 996 997 998 999
Call polyfit along dimension time
with deg=1
for a Linear least squared regression:
In [5]: da.polyfit('time', deg=1)
Out[5]:
<xarray.Dataset>
Dimensions: (degree: 2, y: 1000, x: 1000)
Coordinates:
* degree (degree) int64 1 0
* y (y) int64 0 1 2 3 4 5 6 ... 994 995 996 997 998 999
* x (x) int64 0 1 2 3 4 5 6 ... 994 995 996 997 998 999
Data variables:
polyfit_coefficients (degree, y, x) float64 dask.array<chunksize=(2, 10, 1000), meta=np.ndarray>
Full workflow
First, assign your year values as a coordinate on the array, and use swap_dims
to replace 'time'
with the annual values (for proper scaling of the coefficients).
In [13]: year = np.arange(1950, 2050)
In [14]: da.coords['year'] = ('time', ), year
In [15]: da = da.swap_dims({'time': 'year'})
Now you can call polyfit to get the trend in [units/year]
:
In [16]: coeffs = da.polyfit('year', deg=1)
xr.polyval
can be used to predict the value using the resulting coefficients:
In [17]: pred = xr.polyval(da.year, coeffs.chunk({'x': 100, 'y': 100}))
In [18]: pred
Out[18]:
<xarray.Dataset>
Dimensions: (y: 1000, x: 1000, year: 100)
Coordinates:
* y (y) int64 0 1 2 3 4 5 6 ... 994 995 996 997 998 999
* x (x) int64 0 1 2 3 4 5 6 ... 994 995 996 997 998 999
* year (year) int64 1950 1951 1952 1953 ... 2047 2048 2049
Data variables:
polyfit_coefficients (year, y, x) float64 dask.array<chunksize=(100, 100, 100), meta=np.ndarray>
And the result can be used to calculate the unexplained variation:
In [19]: unexplained = da - pred
In [20]: unexplained
Out[20]:
<xarray.Dataset>
Dimensions: (y: 1000, x: 1000, year: 100)
Coordinates:
* y (y) int64 0 1 2 3 4 5 6 ... 994 995 996 997 998 999
* x (x) int64 0 1 2 3 4 5 6 ... 994 995 996 997 998 999
* year (year) int64 1950 1951 1952 1953 ... 2047 2048 2049
time (year) datetime64[ns] 1950-12-31 ... 2049-12-31
Data variables:
polyfit_coefficients (year, y, x) float64 dask.array<chunksize=(100, 100, 100), meta=np.ndarray>