Questions tagged [zarr]

Zarr is a Python package providing an implementation of compressed, chunked, N-dimensional arrays, designed for use in parallel computing.

Zarr is a Python package providing an implementation of compressed, chunked, N-dimensional arrays (like NetCDF4, HDF5), designed for use in parallel computing and on the Cloud. See http://zarr.readthedocs.io/en/stable/ for more information.

93 questions
3
votes
1 answer

Asynchronous Xarray writing to Zarr

all. I'm using a Dask Distributed cluster to write Zarr+Dask-backed Xarray Datasets inside of a loop, and the dataset.to_zarr is blocking. This can really slow things down when there are straggler chunks that block the continuation of the loop. …
jkmacc
  • 6,125
  • 3
  • 30
  • 27
3
votes
3 answers

Access one chunk in Zarr

Zarr saves an array on disk in chunks, each chunk is a separate file. Is there a way to access only one, chosen chunk (file)? Can it be determined which chunks are empty without loading the whole array into memory?
alex
  • 10,900
  • 15
  • 70
  • 100
2
votes
1 answer

can you save an object array using zarr?

Following zarr's tutorial, I'm trying to save a list of list of ints to a persistent zarr: Failed method 1: import numcodecs, zarr zarr.save("path/to/zarr", [[1], [2]], dtype=object, object_codec=numcodecs.JSON()) Failed method 2: import…
David Taub
  • 734
  • 1
  • 7
  • 27
2
votes
1 answer

printing the uploading progress using `xarray.Dataset.to_zarr` function

I'm trying to upload an xarray dataset to GCP using the function ds.to_zarr(store=store), and it works perfect. However, I would like to show the progress of big datasets. Is there any option to chunk my dataset in a way I can use tqdm or someting…
Henry Ruiz
  • 23
  • 4
2
votes
1 answer

How to store data from dask.distributed on disk?

I'm trying to scale my computations from local Dask Arrays to Dask Distributed. Unfortunately, I am new to distributed computed, so I could not adapt the answer here for my purpose. Mainly my problem is saving data from distributed computations back…
Helmut
  • 311
  • 1
  • 9
2
votes
2 answers

How to use Dask.Array.From_Zarr to open a zarr file on Dask?

I'm having quite a problem when converting a zarr file to a dask array. This is what I get when I type arr = da.from_zarr('gros.zarr/time') : but when I try on one coordinates such as time it works: Any Ideas how to solve this ?
Severus
  • 35
  • 4
2
votes
1 answer

How to match all variables in xarray encoding (blosc, zarr compression)

The example of how to use zarr compression has the following code example see xarray doc: In [42]: import zarr In [43]: compressor = zarr.Blosc(cname="zstd", clevel=3, shuffle=2) In [44]: ds.to_zarr("foo.zarr", encoding={"foo": {"compressor":…
marscher
  • 800
  • 1
  • 5
  • 22
2
votes
1 answer

open remote zarr store with many groups and keep coordinates using xarray

I would like to read into the remote zarr store of https://hrrrzarr.s3.amazonaws.com/index.html#sfc/20210208/20210208_00z_anl.zarr/. Info of the zarr store is at https://mesowest.utah.edu/html/hrrr/zarr_documentation/zarrFileVariables.html I am able…
Ray Bell
  • 1,508
  • 4
  • 18
  • 45
2
votes
1 answer

open_mfdataset() on remote zarr store giving zarr.errors.GroupNotFoundError

I'm looking to read a remote zarr store using xarray.open_mfdataset() I'm getting a zarr.errors.GroupNotFoundError: group not found at path ''. Traceback at the bottom. import xarray as xr import s3fs fs = s3fs.S3FileSystem(anon=True) uri =…
Ray Bell
  • 1,508
  • 4
  • 18
  • 45
2
votes
1 answer

optimise zarr array processing

I have a list (mylist) of 80 5-D zarr files with the following structure (T, F, B, Az, El). The array has shape [24x4096x2016x24x8]. I want to extract sliced data and run a probability along some axis using the following function def…
Nad
  • 21
  • 1
2
votes
1 answer

Dask array to zarr with unknown shapes

I am trying to store a dask array in a zarr file. I have managed to do it when the dask array has a defined shape. import dask import dask.array as da import numpy as np from tempfile import TemporaryDirectory import zarr np_array =…
1
vote
1 answer

Better way to identify chunks where data is available in zarr

I have a zarr store of weather data with 1 hr time interval for the year 2022. So 8760 chunks. But there are data only for random days. How do i check which are the hours in 0 to 8760, the data is available? Also the store is defined with…
sjd
  • 1,329
  • 4
  • 28
  • 48
1
vote
0 answers

What is the maximum array size for Zarr?

I was not able to find information on any limits of Zarr arrays. Is there any practical limit on the size of Zarr arrays apart from the disk space? This is covering both: number of dimensions and size of each dimension.
SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
1
vote
1 answer

Using xarray to convert a zarr file to a netcdf causing memory allocation error

I have a zarr file that I'd like to convert to a netcdf which is too large to fit in memory. My computer has 32GB of RAM so writing ~5.5GB chunks shouldn't be a problem. However, within seconds of running this script, my memory usage quickly tops…
1
vote
1 answer

Open root zarr with multiple groups using xarray

Suppose I had a zarr file that has n groups each of which have only one zarr array and share at least 3 dimensions but may have others as well, how would I load an xarray Dataset from said zarr root file while aligning their common dimensions and…
Curious
  • 383
  • 3
  • 13