0

I'm trying to use a zarr store as a fixed-size buffer (i.e. new data is appended to the end, and the same amount of data is removed from the beginning when a certain size is reached).

The store is huge (20 TB), and contains a 2D matrix (positions over time).

Writing to zarr is handled by xarray.

However, I'm not sure whether zarr supports this.

I can think of two solutions:

  • create a new xarray object from the first, eliminating the older data. However, writing that to disk will either append ("a"), leaving the older data intact, or overwrite ("w"), in which case I'm afraid the whole thing is rewritten which would not be performant for 20 TB.
  • use zarr.core.Array.resize, but this does not seem to allow dropping data at the start

Maybe zarr does not support this and I have to think of another solution, or writing my own store specifically aimed at this type of problem.

bluppfisk
  • 2,538
  • 3
  • 27
  • 56
  • 1
    This is an interesting problem. It sounds like you are using Zarr's DirectoryStore on a posix filesystem, is that right? Have you seen https://medium.com/informatics-lab/creating-a-data-format-for-high-momentum-datasets-a394fa48b671? – jhamman Jan 17 '23 at 18:10
  • Thank you for this invaluable reference! I think their solution might work for us, however they've not updated on this since 2019 so I wonder where it ended up getting them. – bluppfisk Jan 19 '23 at 16:53

0 Answers0