Questions tagged [zarr]

Zarr is a Python package providing an implementation of compressed, chunked, N-dimensional arrays, designed for use in parallel computing.

Zarr is a Python package providing an implementation of compressed, chunked, N-dimensional arrays (like NetCDF4, HDF5), designed for use in parallel computing and on the Cloud. See http://zarr.readthedocs.io/en/stable/ for more information.

93 questions
1
vote
1 answer

zarr not respecting chunk size from xarray and reverting to original chunk size

I'm opening a zarr file and then rechunking it and then writing it back out to a different zarr store. Yet when I open it back up it doesn't respect the chunk size I previously wrote. Here is the code and the output from jupyter. Any idea what I'm…
clifgray
  • 4,313
  • 11
  • 67
  • 116
1
vote
1 answer

xarray.Dataset.to_zarr: docs for “Appending to existing Zarr stores”?

In the description of the 'region' argument to xarray.Dataset.to_zarr the last sentence states: See “Appending to existing Zarr stores” in the reference documentation for full details. I have not been able to find this reference in the reference…
1
vote
1 answer

Concurrently write xarray datasets to zarr - how to efficiently scale with dask distributed

TLDR: How can I efficiently use dask-distributed to write a number of dask-backed xarray datasets to a zarr store on AWS S3? Details: I have a workflow that takes a list of raster datasets on S3 and generates a dask-array backed xarray dataset. I…
Val
  • 6,585
  • 5
  • 22
  • 52
1
vote
1 answer

Zarr: improve xarray writing performance to S3

Writing xarray datasets to AWS S3 takes a surprisingly big amount of time, even when no data is actually written with compute=False. Here's an example: import fsspec import xarray as xr x = xr.tutorial.open_dataset("rasm") target =…
Val
  • 6,585
  • 5
  • 22
  • 52
1
vote
1 answer

Zarr open() returns FSPathExistNotDir error

When I run zarr.open('result.zarr', mode='r') I get the following error: FSPathExistNotDir: path exists but is not a directory: %r According to the example in the Zarr documentation located at…
tmor83
  • 11
  • 1
1
vote
1 answer

getting KeyError '.zmetadata' when opening remote zarr store

Trying to read in a zarr store from s3 using xarray. Getting a Key Error. Any thoughts import fsspec import xarray as xr uri = "s3://era5-pds/zarr/2020/12/data/eastward_wind_at_10_metres.zarr" ds = xr.open_zarr(fsspec.get_mapper(uri, anon=True),…
Ray Bell
  • 1,508
  • 4
  • 18
  • 45
1
vote
0 answers

Timeout when writing files to zarr.storage.DBMStore which make the whole zarr store unreadable

I'm new to zarr, and trying to use Xarray to output files to zarr.storage.DBMStore. Since, the dataset is quite large, I calculate and output the results to one zarr.storage.DBMStore multiple times (append dim='time'). However, everytime when the…
Qinkong
  • 11
  • 1
1
vote
1 answer

xarray.Dataset.to_zarr: overwrite data if exists with append_dim

With xarray.Dataset.to_zarr it is possible to write an xarray to a .zarr file and append new data along a dimension using the append_dim parameter. However, if the coordinate of the new data for this dimension is already there, the existing data…
SyntaxError
  • 330
  • 3
  • 16
1
vote
1 answer

Poor CPU utilization when transforming netcdfs to zarr and rechunking

I am transferring and rechunking data from netcdf to zarr. The process is slow and is not using much of the CPUs. I have tried several different configurations, sometimes it seems to do slightly better, but it hasn't worked well. Does anyone have…
1
vote
1 answer

Why am I getting this error when I try to slice a Zarr array exactly the same way I would slice a Numpy array?

I am using the following code to slice a Zarr array from disk: import zarr as zr db = zr.open('/content/drive/My Drive/Share/Daily Data/Database/dbz.zarr', mode='r') data = db[db[:,0]==20171003] Here is the error: IndexError …
lara_toff
  • 413
  • 2
  • 14
1
vote
0 answers

MariaDB to a Zarr Array, how can this be done?

Can a MariaDB be used with Zarr or migrated to Zarr in a lossless fashion, if so please provide some guidance on how this can be achieved? I have searched the Zarr docs and MariaDB docs and did not find enough information on this topic. I don't…
Brian
  • 11
  • 2
1
vote
2 answers

When does zarr compress a chunk and push it to the underlying storage system?

I'm reading data from a large text file (a VCF) into a zarr array. The overall flow of the code is with zarr.LMDBStore(...) as store: array = zarr.create(..., chunks=(1000,1000), store=store, ...) for line_num, line in enumerate(text_file): …
1
vote
2 answers

How to write a large dask array (numpy.ndarray) to a Zarr file leveraging GPUs?

I am trying to write a large dask array (46 GB with 124 -- 370 MB chunks) to a zarr file using dask. If my dask array was named dask_data, then a simple dask_data.to_zarr("my_zarr.zarr") would work. But from what I understand, this is a…
irahorecka
  • 1,447
  • 8
  • 25
1
vote
2 answers

dask histogram from zarr file (a big zarr file)

So heres my question, I have a big 3dim array which is 100GB in size as a #zarr file (the array is more than twice the size). I have tried using the histogram from #Dask to calculate but I get an error saying that it cant do it because the file has…
1
vote
1 answer

zarr.consolidate_metadata yields error: 'memoryview' object has no attribute 'decode'

I have an existing LMDB zarr archive (~6GB) saved at path. Now I want to consolidate the metadata to improve read performance. Here is my script: store = zarr.LMDBStore(path) root = zarr.open(store) zarr.consolidate_metadata(store) store.close() I…
mcb
  • 398
  • 2
  • 12