3

Zarr saves an array on disk in chunks, each chunk is a separate file. Is there a way to access only one, chosen chunk (file)?

Can it be determined which chunks are empty without loading the whole array into memory?

alex
  • 10,900
  • 15
  • 70
  • 100

3 Answers3

0

I'm not aware of any way to find chunk size except hitting the FS yourself. Zarr abstracts over that. Maybe you'll have to explain what you're up to.

The project I'm currently working on uses Zarr to store meteorological data. We keep the data in a 3 dimensional array of shape (t, x, y). Alongside the data, we have an array of shape (t), effectively a bitmask to record which slots are filled. So when data comes in, we write

data[t] = [...]
ready[t] = 1

So when querying for data we know at what timeslots to expect data, and which slots are empty.

sba
  • 1,829
  • 19
  • 27
0

It's possible to see what chunks are filled by looking at the keys method of the underlying chunk_store. Only keys with data will be filled.

The corresponding values of these keys will contain the data of that chunk, but it will be compressed. If you want more than that, would encourage you to raise an issue over at the Zarr repo.

jakirkham
  • 685
  • 5
  • 18
0

I don't think there is a general solution to know which chunks are initialized for any storage type, but for DirectoryStore, it is possible to list the filesystem to know which chunks are initialized. This is how zarr do it to compute the nchunks_initialized property.

I suppose you could get some inspiration from there to list all initialized chunks and then compute which slice it corresponds to in the array.

While there is no object for a chunk in zarr, you can compute their beginning and end along each axis from the array dimensions and chunk dimensions. If you want to load the chunks one by one for efficiency reasons, you can compute their indices and slice the zarr Array to get a numpy array as a working area.

Since I had similar needs, I built some function as helpers to do just that, you can look them up at https://github.com/maxime915/msi_zarr_analysis/blob/126c1115bd43e8813d2f002673491c6ef25e37db/msi_zarr_analysis/utils/iter_chunks.py if you want some inspiration.

Maxime A
  • 90
  • 1
  • 9