Questions tagged [zarr]

Zarr is a Python package providing an implementation of compressed, chunked, N-dimensional arrays, designed for use in parallel computing.

Zarr is a Python package providing an implementation of compressed, chunked, N-dimensional arrays (like NetCDF4, HDF5), designed for use in parallel computing and on the Cloud. See http://zarr.readthedocs.io/en/stable/ for more information.

93 questions
1
vote
2 answers

How do I encode NaN values in xarray / zarr with integer dtype?

I have a large xarray DataArray containing NaNs and want to save it with zarr. I want to minimize the file size and am OK with losing a few bits of precision - 16 bits ought to be OK. I tried using numcodecs.FixedScaleOffset(astype='u2') filter but…
user7813790
  • 547
  • 1
  • 4
  • 12
1
vote
1 answer

What would happen if in case of concurrent read/write access?

In zarr tutorial it is written: Zarr arrays have not been designed for situations where multiple readers and writers are concurrently operating on the same array. What would happen if it does happen? Will it crash? Undefined behavior? Will it just…
agemO
  • 263
  • 2
  • 9
1
vote
1 answer

multiple compressor for single array

Is it possible to have different compressors, e.g. lossy and lossless for individual chunks? In a scenario, where you have a mask of importance, where you want to keep signal with lossless compression or even with no compression, but have other…
genezin
  • 63
  • 7
1
vote
1 answer

Round-tripping Zarr data from Xarray

With xarray, I'm using ds.to_zarr() to write a dataset to S3 and then xr.open_zarr() to see if I get the same dataset back. My dataset in xarray looks like: Dimensions: (nv: 2, reference_time: 11, time: 11, x:…
Rich Signell
  • 14,842
  • 4
  • 49
  • 77
0
votes
1 answer

Upload zarr folder to Azure blob container

I tried to upload a zarr file (folder-like) to an Azure container using Python but it does not work properly, as it only uploaded the innermost files and deleted everything else in the container. This is my code: def upload_zarr(file_path): …
Minh Phan
  • 33
  • 3
0
votes
0 answers

Open zarr with xarray_tensorstore: deep copy is not working after openning file

I opened a file using xarray_tensorstore.open_zarr. While the opening was successful, I encountered an issue when trying to use copy(deep=True). However, copy(deep=True) works correctly when I use the original dataset or open the file with…
A-_-S
  • 680
  • 5
  • 21
0
votes
0 answers

Compressing integers to 12-bit packed format with zarr and numcodecs

I am working with 12-bit integer image data (i.e. data aquired from a camera that uses a 12-bit ADC) which I ultimately store in a Zarr array. Currently I store the images as 16-bit integers, which means I am wasting 30% extra memory. I would like…
ptbrown
  • 101
  • 2
  • 10
0
votes
1 answer

xarray Dataset resize coordinates in dimension for to_zarr()

I have a dataset and wish to append it to an existing zarr and I must handle that one dimension in the dataset, 'range', is of different size than in the zarr. I'm hoping to just pad data with zeros or nan. The problem Here is what happens when I…
vindo
  • 1
  • 1
0
votes
1 answer

How to store a subset of Xaray data into Zarr?

Context In the section Appending to existing Zarr stores, the example is as follows import xarray as xr import dask.array # Write zarr with empty structure dummies = dask.array.zeros(30, chunks=10) ds = xr.Dataset({"foo": ("x", dummies)}) path =…
Dahn
  • 1,397
  • 1
  • 10
  • 29
0
votes
0 answers

Initializing empty zarr for parallel writing

I want to select subset area from ERA5 Zarr and write it to my own bucket. I want to do parallel processing. I first read the data: import xarray, fsspec, boto3, dask import dask.array as da from dask.distributed import Client, LocalCluster from…
Pörripeikko
  • 839
  • 7
  • 6
0
votes
0 answers

Change dtype of already existing zarr array

Could you tell me please if it is possible to change the dtype of already created zarr array? Best regards, Aliaksei
aaxx
  • 55
  • 1
  • 5
0
votes
0 answers

aiohttp.client_exceptions.ClientConnectorError when using proxy (clash) when doownloading CMIP6 data through pangeo

1. Environment Linux (#44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon May 22 13:39:36 UTC 2 x86_64 x86_64 x86_64 GNU/Linux) Anaconda (23.1.0) running a proxy (clash) conda environment created by python_pangeo_cmip6 2. Logs/tracks $ python get_cmip6.py…
Lorraine
  • 1
  • 1
0
votes
1 answer

Xarray write large dataset on memory without killing the kernel

Context: I have the following dataset: Goal: I want to write it on my disk. I am using chunks so the dataset does not kill my kernel. Problem: I tried to save it on my disk with chunks using: Option 1: to_zarr -> biggest homogeneous chunks…
Nihilum
  • 549
  • 3
  • 11
0
votes
0 answers

Why does zarr.convenince.consolidate_metadata() not work within a LinearFlow (metaflow)?

This is an issue about metaflow, zarr, python I am creating a LinearFlow using metaflow and zarr. All is going well except one key zarr function: when I try to consolidate all my metadata into a Metadata Store inside the flow, I get no error message…
0
votes
1 answer

Interpolating values in xarray using non-indexed coordinates

I'm trying to fetch time series from geographical coordinates (single points) from Google ERA5 Reanalysis data. The dataset is following: import xarray data = xarray.open_zarr( 'gs://gcp-public-data-arco-era5/co/single-level-reanalysis.zarr/', …
Pörripeikko
  • 839
  • 7
  • 6