zarr not respecting chunk size from xarray and reverting to original chunk size

Question

I'm opening a zarr file and then rechunking it and then writing it back out to a different zarr store. Yet when I open it back up it doesn't respect the chunk size I previously wrote. Here is the code and the output from jupyter. Any idea what I'm doing wrong here?

bathy_ds = xr.open_zarr('data/bathy_store')
bathy_ds.elevation

bathy_ds.chunk(5000).elevation

bathy_ds.chunk(5000).to_zarr('data/elevation_store')
new_ds = xr.open_zarr('data/elevation_store')
new_ds.elevation

It is reverting back to the original chunking as if I'm not fully overwriting it or changing some other setting that needs changing.

Val · Accepted Answer · 2021-05-11T13:44:22.080

This seems to be a known issue, and there's a fair bit of discussion going on within the issue's thread and a recently merged PR.

Basically, the dataset carries the original chunking around in the .encoding property. So when you call the second write operation, the chunks defined in ds[var].encoding['chunks'] (if present) will be used to write var to zarr.

According to the conversation in the GH issue, the currently best solution is to manually delete the chunk encoding for the variables in question:

for var in ds:
    del ds[var].encoding['chunks']

However, it should be noted that this seems to be an evolving situation, where it's be good to check in on the progress to adapt a final solution.

Here's a little example that showcases the issue and solution:

import xarray as xr

# load data and write to initial chunking 
x = xr.tutorial.load_dataset("air_temperature")
x.chunk({"time":500, "lat":-1, "lon":-1}).to_zarr("zarr1.zarr")

# display initial chunking
xr.open_zarr("zarr1.zarr/").air

# rechunk
y = xr.open_zarr("zarr1.zarr/").chunk({"time": -1})

# display
y.air

#write w/o modifying .encoding
y.to_zarr("zarr2.zarr")

# display
xr.open_zarr("zarr2.zarr/").air

# delete encoding and store
del y.air.encoding['chunks']
y.to_zarr("zarr3.zarr")

# display
xr.open_zarr("zarr3.zarr/").air

Thanks!! This worked well and I'll keep an eye on that pull request to see how it changes in the future. — clifgray, May 11 '21 at 15:43

zarr not respecting chunk size from xarray and reverting to original chunk size

1 Answers1