1

I'm opening a zarr file and then rechunking it and then writing it back out to a different zarr store. Yet when I open it back up it doesn't respect the chunk size I previously wrote. Here is the code and the output from jupyter. Any idea what I'm doing wrong here?

bathy_ds = xr.open_zarr('data/bathy_store')
bathy_ds.elevation

enter image description here

bathy_ds.chunk(5000).elevation

enter image description here

bathy_ds.chunk(5000).to_zarr('data/elevation_store')
new_ds = xr.open_zarr('data/elevation_store')
new_ds.elevation

enter image description here

It is reverting back to the original chunking as if I'm not fully overwriting it or changing some other setting that needs changing.

clifgray
  • 4,313
  • 11
  • 67
  • 116

1 Answers1

7

This seems to be a known issue, and there's a fair bit of discussion going on within the issue's thread and a recently merged PR.

Basically, the dataset carries the original chunking around in the .encoding property. So when you call the second write operation, the chunks defined in ds[var].encoding['chunks'] (if present) will be used to write var to zarr.

According to the conversation in the GH issue, the currently best solution is to manually delete the chunk encoding for the variables in question:

for var in ds:
    del ds[var].encoding['chunks']

However, it should be noted that this seems to be an evolving situation, where it's be good to check in on the progress to adapt a final solution.

Here's a little example that showcases the issue and solution:

import xarray as xr

# load data and write to initial chunking 
x = xr.tutorial.load_dataset("air_temperature")
x.chunk({"time":500, "lat":-1, "lon":-1}).to_zarr("zarr1.zarr")

# display initial chunking
xr.open_zarr("zarr1.zarr/").air

enter image description here

# rechunk
y = xr.open_zarr("zarr1.zarr/").chunk({"time": -1})

# display
y.air

enter image description here

#write w/o modifying .encoding
y.to_zarr("zarr2.zarr")

# display
xr.open_zarr("zarr2.zarr/").air

enter image description here

# delete encoding and store
del y.air.encoding['chunks']
y.to_zarr("zarr3.zarr")

# display
xr.open_zarr("zarr3.zarr/").air

enter image description here

Val
  • 6,585
  • 5
  • 22
  • 52
  • Thanks!! This worked well and I'll keep an eye on that pull request to see how it changes in the future. – clifgray May 11 '21 at 15:43