I would like to initialize a very large XArray dataset (as on-disk Zarr if possible) for later processing - various parts (spatial subsets) of the dataset will be populated by a different script.
This won't work, because the dataset obviously doesn't fit into memory.
import numpy as np
import xarray as xr
xr_lons = xr.DataArray(np.arange(-180, 180, 0.001), dims=['x'], name='lons')
xr_lats = xr.DataArray(np.arange(90, -90, -0.001), dims=['y'], name='lats')
xr_da = xr.DataArray(0, dims=['y', 'x'], coords=[xr_lats, xr_lons])
xr_ds = xr.Dataset({"test": xr_da})
xr_ds.to_zarr("test.zarr", mode="w")
MemoryError: Unable to allocate 483. GiB for an array with shape (180000, 360000) and data type int64
What would be a good alternative?
I'm looking for a solution like this using plain zarr:
import zarr
root = zarr.open('example.zarr', mode='w')
mosaics = root.create_group('mosaics')
dsm = mosaics.create_dataset('dsm', shape=(nrows, ncols), chunks=(1024, 1024), dtype='i4')
The dataset, however, doesn't conform to the normal XArray structure and metadata so I'm looking for a solution directly using XArray (or Dask?)