0

How would I go about creating and return a file new_zarr.zarr from a xarray Dataset?

I know xarray.Dataset.to_zarr() exists but this returns a ZarrStore and I must return a bytes-like object.

I have tried using the tempfile module but am unsure how to proceed, how would I write an xarray.Dataset to a bytes-like object that reurns a .zarr file that can be downloaded?

A B
  • 87
  • 1
  • 7

1 Answers1

3

Zarr supports multiple storage backends (DirectoryStore, ZipStore, etc.). If you are looking for a single file object, it sounds like the ZipStore is what you want.

import xarray as xr
import zarr

ds = xr.tutorial.open_dataset('air_temperature')
store = zarr.storage.ZipStore('./new_zarr.zip')
ds.to_zarr(store)

The zip file can be thought of as a single file zarr store and can be downloaded (or moved around as a single store).


Update 1

If you want to do this all in memory, you could extend zarr.ZipStore to allow passing in a BytesIO object:

class MyZipStore(zarr.ZipStore):
    
    def __init__(self, path, compression=zipfile.ZIP_STORED, allowZip64=True, mode='a',
                 dimension_separator=None):

        # store properties
        if isinstance(path, str):  # this is the only change needed to make this work
            path = os.path.abspath(path)
        self.path = path
        self.compression = compression
        self.allowZip64 = allowZip64
        self.mode = mode
        self._dimension_separator = dimension_separator

        # Current understanding is that zipfile module in stdlib is not thread-safe,
        # and so locking is required for both read and write. However, this has not
        # been investigated in detail, perhaps no lock is needed if mode='r'.
        self.mutex = RLock()

        # open zip file
        self.zf = zipfile.ZipFile(path, mode=mode, compression=compression,
                                  allowZip64=allowZip64)

Then you can create the create the zip file in memory:

zip_buffer = io.BytesIO()

store = MyZipStore(zip_buffer)

ds.to_zarr(store)

You'll notice that the zip_buffer contains a valid zip file:

zip_buffer.read(10)
b'PK\x03\x04\x14\x00\x00\x00\x00\x00'

(PK\x03\x04 is the Zip file magic number)

jhamman
  • 5,867
  • 19
  • 39
  • thanks for that, my main objective is to write and return a zarr dataset in memory without having to write to disk. Would I be able to do the following: `z_file = ds.to_zarr(Zarr.MemoryStore())` then `open(shutil.make_archive('file_name', 'zip', z_file), 'rb').read()` When I try that, it seems that I am not getting bytes returned – A B Jan 03 '23 at 15:25
  • 1
    I've updated the answer to address the completely-in-memory use case. This should be supported in Zarr directly, and as it turns out, there is already an open issue for this: https://github.com/zarr-developers/zarr-python/issues/1018 – jhamman Jan 03 '23 at 21:35