Questions tagged [zarr]

Zarr is a Python package providing an implementation of compressed, chunked, N-dimensional arrays, designed for use in parallel computing.

Zarr is a Python package providing an implementation of compressed, chunked, N-dimensional arrays (like NetCDF4, HDF5), designed for use in parallel computing and on the Cloud. See http://zarr.readthedocs.io/en/stable/ for more information.

93 questions
0
votes
1 answer

zarr consolidate_metadata errors with TypeError: memoryview: a bytes-like object is required, not 'Array'

I'm trying to consolidate the metadata of an existing zarr store, though the same error occurs if I make a new zarr store and call zarr.consolidate_metadata(store). Code example: import zarr ## create test zarr store path_to_store =…
Adair
  • 1,697
  • 18
  • 22
0
votes
1 answer

xarray loading int data as float

Say I create a dataset with an integer variable. import xarray as xr import numpy as np int_var = np.random.randint(0, 10, 10) ds = xr.Dataset(data_vars={"int_var": (("x"), int_var)}, coords={"x": range(10)}) Then I save it,…
Adair
  • 1,697
  • 18
  • 22
0
votes
1 answer

Is it possible to store multiple groups and arrays in a single file?

I've been using HDF5 to store time series data and I want to try using Zarr due to its various features. I'm reading its tutorial and following each step, and I've realized that maybe Zarr uses directories on a file system instead of a single file…
maynull
  • 1,936
  • 4
  • 26
  • 46
0
votes
2 answers

How can I rename a Zarr array without writing new store?

I have a Zarr datastore, but I need to rename one of the dimensions. Let's say I have this (from xarray docs): data = np.random.rand(4, 3) locs = ["IA", "IL", "IN"] times = pd.date_range("2000-01-01", periods=4) da = xr.DataArray(data,…
j sad
  • 1,055
  • 9
  • 16
0
votes
2 answers

Transform zarr directory storage to zip storage

codes: store = zarr.ZipStore("/mnt/test.zip", "r") Problem description: Hi, sry for bothering, I found this statement inside Zarr official documentation about ZipStorage: Alternatively, use a DirectoryStore when writing the data, then manually Zip…
eddie Gao
  • 3
  • 1
0
votes
1 answer

Does Zarr has built-in multi-threading support for fast read and write?

I am trying to speed up reading and writing Zarr files using multi-threading. For example, if I can store an array in 5 chunks, is there a way to use a thread per chunk to speed up reading and writing the array to and from disk (possibly using…
Ali Jooya
  • 75
  • 7
0
votes
1 answer

verify that Zarr has been fully installed by running the test suite

suggested cmds on Zarr: $ pip install pytest $ python -m pytest -v --pyargs zarr what I tried to make it work: $ pip3 install pytest (succeeded) $ python3.7 pytest -v --pyargs zarr Error I…
0
votes
1 answer

Memory leak issue using PyTorch IterableDataset with zarr

I'm trying to build a pytorch project on an IterableDataset with zarr as storage backend. class Data(IterableDataset): def __init__(self, path, start=None, end=None): super(Data, self).__init__() store =…
sobek
  • 1,386
  • 10
  • 28
0
votes
1 answer

Dask looping overhead from libraries

When calling another libary to dask such as scikit image contrast stretch, I realise that dask is creating a result for each block, storing in either memory or spilling to disk seperately. Then it attempts to merge all the results. Thats fine if…
0
votes
1 answer

Limit memory footprint when storing `dask.array.map_blocks` output

Consider a 2D array X to large to fit in memory--in my case it's stored in the Zarr format but that doesn't matter. I would like to map a function block-wise over the array and save the result without having ever loading the entire array into…
Richard Border
  • 3,209
  • 16
  • 30
0
votes
1 answer

Efficient way of storing 1TB of random data with Zarr

I'd like to store 1TB of random data backed by a zarr on disk array. Currently, I am doing something like the following: import numpy as np import zarr from numcodecs import Blosc compressor = Blosc(cname='lz4', clevel=5,…
quasiben
  • 1,444
  • 1
  • 11
  • 19
0
votes
0 answers

How can one write lock a zarr store during append?

Is there some way to lock a zarr store when using append? I have already found out the hard way that using append with multiple processes is a bad idea (the batches to append aren't aligned with the batch size of the store). The reason I'd like to…
sobek
  • 1,386
  • 10
  • 28
0
votes
1 answer

Display all variants

I have a 2GB vcf DNA file and I am trying to use vcf_to_zarr() to print out all the variant with all fixed fields but I am getting the error KeyError: 'variants/*' allel.vcf_to_zarr import allel import numcodecs import zarr def readVcf(): …
user11766958
  • 409
  • 3
  • 12
0
votes
1 answer

Zarr multithreaded reading of groups

Not sure if this question makes sense/is relevant wrt zarr. I'm storing zarr data on disk in groups so for example I have group = zarr.group() d1 = group.create_dataset('baz', shape=100, chunks=10) d2 = group.create_dataset('foo', shape=100,…
Michael
  • 7,087
  • 21
  • 52
  • 81
0
votes
2 answers

How to create .mdb file?

I am new with zarr, HDF5 and LMDB. I have converted data from HDF5 to Zarr but i got many files with extension .n (n from 0 to 31). I want to have just one file with .zarr extension. I tried to use LMDB (zarr.LMDBStore function) but i don't…