0

I've been using HDF5 to store time series data and I want to try using Zarr due to its various features.

I'm reading its tutorial and following each step, and I've realized that maybe Zarr uses directories on a file system instead of a single file and hierarchical structures inside it?

For example, when I do this:

root = zarr.open('group.zarr', mode='w')
foo = root.create_group('foo')
bar = foo.create_group('bar')
z1 = bar.zeros('baz', shape=(10000, 10000), chunks=(1000, 1000), dtype='i4')

I thought that a file named group.zarr would be created with 2 groups in the file. In reality, it creates 3 consecutive local directories.

I prefer saving arrays in a single file. Is it also possible to create multiple groups and arrays in a single file using Zarr, like HDF5 does?

Ken White
  • 123,280
  • 14
  • 225
  • 444
maynull
  • 1,936
  • 4
  • 26
  • 46
  • Zarr arrays are generally stored in multiple files even if you're not creating groups. That's what it means when it says "chunked". It's something you see a lot with cloud-optimized file formats as opposed to older formats since it makes parallel access more natural. – Adair Jun 29 '21 at 16:51

1 Answers1

1

Zarr directories can be zipped together into a single Zip file:

7z a -tzip archive.zarr.zip archive.zarr/.

The ZipStore can then be used to read the single files, though that should be transport for users from zarr.open().

Josh
  • 201
  • 2
  • 5