Questions tagged [zarr]

Zarr is a Python package providing an implementation of compressed, chunked, N-dimensional arrays, designed for use in parallel computing.

Zarr is a Python package providing an implementation of compressed, chunked, N-dimensional arrays (like NetCDF4, HDF5), designed for use in parallel computing and on the Cloud. See http://zarr.readthedocs.io/en/stable/ for more information.

93 questions

vote

1 answer

zarr not respecting chunk size from xarray and reverting to original chunk size

I'm opening a zarr file and then rechunking it and then writing it back out to a different zarr store. Yet when I open it back up it doesn't respect the chunk size I previously wrote. Here is the code and the output from jupyter. Any idea what I'm…

python python-xarray zarr

asked May 10 '21 at 19:22

clifgray

4,313
11
67
116

vote

1 answer

xarray.Dataset.to_zarr: docs for “Appending to existing Zarr stores”?

In the description of the 'region' argument to xarray.Dataset.to_zarr the last sentence states: See “Appending to existing Zarr stores” in the reference documentation for full details. I have not been able to find this reference in the reference…

python python-xarray zarr

asked Apr 20 '21 at 03:07

user15699989

vote

1 answer

Concurrently write xarray datasets to zarr - how to efficiently scale with dask distributed

TLDR: How can I efficiently use dask-distributed to write a number of dask-backed xarray datasets to a zarr store on AWS S3? Details: I have a workflow that takes a list of raster datasets on S3 and generates a dask-array backed xarray dataset. I…

python dask python-xarray dask-distributed zarr

asked Mar 23 '21 at 19:20

Val

6,585
5
22
52

vote

1 answer

Zarr: improve xarray writing performance to S3

Writing xarray datasets to AWS S3 takes a surprisingly big amount of time, even when no data is actually written with compute=False. Here's an example: import fsspec import xarray as xr x = xr.tutorial.open_dataset("rasm") target =…

python python-xarray zarr python-s3fs

asked Mar 11 '21 at 09:57

Val

6,585
5
22
52

vote

1 answer

Zarr open() returns FSPathExistNotDir error

When I run zarr.open('result.zarr', mode='r') I get the following error: FSPathExistNotDir: path exists but is not a directory: %r According to the example in the Zarr documentation located at…

zarr

asked Feb 11 '21 at 18:52

tmor83

vote

1 answer

getting KeyError '.zmetadata' when opening remote zarr store

Trying to read in a zarr store from s3 using xarray. Getting a Key Error. Any thoughts import fsspec import xarray as xr uri = "s3://era5-pds/zarr/2020/12/data/eastward_wind_at_10_metres.zarr" ds = xr.open_zarr(fsspec.get_mapper(uri, anon=True),…

python-xarray zarr fsspec

asked Feb 10 '21 at 20:45

Ray Bell

1,508
4
18
45

vote

0 answers

Timeout when writing files to zarr.storage.DBMStore which make the whole zarr store unreadable

I'm new to zarr, and trying to use Xarray to output files to zarr.storage.DBMStore. Since, the dataset is quite large, I calculate and output the results to one zarr.storage.DBMStore multiple times (append dim='time'). However, everytime when the…

python-xarray zarr

asked Jan 19 '21 at 16:39

Qinkong

vote

1 answer

xarray.Dataset.to_zarr: overwrite data if exists with append_dim

With xarray.Dataset.to_zarr it is possible to write an xarray to a .zarr file and append new data along a dimension using the append_dim parameter. However, if the coordinate of the new data for this dimension is already there, the existing data…

python dask python-xarray zarr

asked Dec 17 '20 at 11:29

SyntaxError

vote

1 answer

Poor CPU utilization when transforming netcdfs to zarr and rechunking

I am transferring and rechunking data from netcdf to zarr. The process is slow and is not using much of the CPUs. I have tried several different configurations, sometimes it seems to do slightly better, but it hasn't worked well. Does anyone have…

python google-compute-engine dask python-xarray zarr

asked Sep 12 '20 at 21:43

Johnmimo

vote

1 answer

Why am I getting this error when I try to slice a Zarr array exactly the same way I would slice a Numpy array?

I am using the following code to slice a Zarr array from disk: import zarr as zr db = zr.open('/content/drive/My Drive/Share/Daily Data/Database/dbz.zarr', mode='r') data = db[db[:,0]==20171003] Here is the error: IndexError …

python numpy zarr

asked Sep 09 '20 at 21:18

lara_toff

vote

0 answers

MariaDB to a Zarr Array, how can this be done?

Can a MariaDB be used with Zarr or migrated to Zarr in a lossless fashion, if so please provide some guidance on how this can be achieved? I have searched the Zarr docs and MariaDB docs and did not find enough information on this topic. I don't…

python sqlite mariadb compression zarr

asked Aug 03 '20 at 17:51

Brian

vote

2 answers

When does zarr compress a chunk and push it to the underlying storage system?

I'm reading data from a large text file (a VCF) into a zarr array. The overall flow of the code is with zarr.LMDBStore(...) as store: array = zarr.create(..., chunks=(1000,1000), store=store, ...) for line_num, line in enumerate(text_file): …

python zarr

asked May 08 '20 at 22:12

user2966505

vote

2 answers

How to write a large dask array (numpy.ndarray) to a Zarr file leveraging GPUs?

I am trying to write a large dask array (46 GB with 124 -- 370 MB chunks) to a zarr file using dask. If my dask array was named dask_data, then a simple dask_data.to_zarr("my_zarr.zarr") would work. But from what I understand, this is a…

python python-3.x dask cupy zarr

asked Feb 07 '20 at 18:49

irahorecka

1,447
8
25

vote

2 answers

dask histogram from zarr file (a big zarr file)

So heres my question, I have a big 3dim array which is 100GB in size as a #zarr file (the array is more than twice the size). I have tried using the histogram from #Dask to calculate but I get an error saying that it cant do it because the file has…

image processing histogram dask zarr

asked Jan 28 '20 at 16:57

Ouetis_Khan

vote

1 answer

zarr.consolidate_metadata yields error: 'memoryview' object has no attribute 'decode'

I have an existing LMDB zarr archive (~6GB) saved at path. Now I want to consolidate the metadata to improve read performance. Here is my script: store = zarr.LMDBStore(path) root = zarr.open(store) zarr.consolidate_metadata(store) store.close() I…

python-3.x zarr

asked Jul 05 '19 at 13:48

mcb

Prev 1 2

4 5 6 7 Next