I have a zarr file that I'd like to convert to a netcdf which is too large to fit in memory. My computer has 32GB of RAM so writing ~5.5GB chunks shouldn't be a problem. However, within seconds of running this script, my memory usage quickly tops out consuming the available ~20GB and the script fails.
Data: Dropbox link to zarr file containing radar rainfall data for 6/28/2014 over the United States that is around 1.8GB in total.
Code:
import xarray as xr
import zarr
fpath_zarr = "out_zarr_20140628.zarr"
ds_from_zarr = xr.open_zarr(store=fpath_zarr, chunks={'outlat':3500, 'outlon':7000, 'time':30})
ds_from_zarr.to_netcdf("ds_zarr_to_nc.nc", encoding= {"rainrate":{"zlib":True}})
Output:
MemoryError: Unable to allocate 5.48 GiB for an array with shape (30, 3500, 7000) and data type float64
Package versions:
dask 2022.7.0
xarray 2022.3.0
zarr 2.8.1