Questions tagged [zarr]

Zarr is a Python package providing an implementation of compressed, chunked, N-dimensional arrays, designed for use in parallel computing.

Zarr is a Python package providing an implementation of compressed, chunked, N-dimensional arrays (like NetCDF4, HDF5), designed for use in parallel computing and on the Cloud. See http://zarr.readthedocs.io/en/stable/ for more information.

93 questions
0
votes
1 answer

How to reproject xarray dataset memory efficiently with chunks and dask?

Context: I have a netcdf file that I want to reproject. It is a costly operation, and I am learning how to use dask and zarr to do it efficiently without crashing my RAM. Code presentation: ds is a 3D xarray dataset (dimensions: time, y, x). This…
Nihilum
  • 549
  • 3
  • 11
0
votes
1 answer

How to prevent `to_zarr` method in xarray from writing all nan chunks to disk?

I want to save a very large zarr file (2 dimensional) chunked equally along both dimensions (X, X) occationally containing chunks made of all nans. To reduce the amount of chunks written to disk, I want xarray's to_zarr method to skip writing this…
0
votes
0 answers

using a zarr store as a fixed-size buffer

I'm trying to use a zarr store as a fixed-size buffer (i.e. new data is appended to the end, and the same amount of data is removed from the beginning when a certain size is reached). The store is huge (20 TB), and contains a 2D matrix (positions…
bluppfisk
  • 2,538
  • 3
  • 27
  • 56
0
votes
0 answers

Predict Zarr file size from dataset?

I would like to predict the zarr file size given a dataset. For example: new_dataset = .to_dataset() zarr_file = new_dataset.to_zarr() I have currently tried to get the size of new_dataset in bytes via sys.getsizeof(). I then took…
A B
  • 87
  • 1
  • 7
0
votes
1 answer

disable zarr.open() output in console

I have unwanted outputs in the console from zarr.open() method. It does not have 'verbose-like' parameter. How can I get rid of those input console ? I'm currently trying to open .ims files (Imaris pictures) thus using the zarr library through this…
Willy Lutz
  • 134
  • 1
  • 1
  • 9
0
votes
1 answer

How to create and return a Zarr file from xarray Dataset?

How would I go about creating and return a file new_zarr.zarr from a xarray Dataset? I know xarray.Dataset.to_zarr() exists but this returns a ZarrStore and I must return a bytes-like object. I have tried using the tempfile module but am unsure how…
A B
  • 87
  • 1
  • 7
0
votes
1 answer

Slow reading of small zarr/S3 data through python-xarray inside a dockerized fastAPI app

I have a tiny dataset like this: Dimensions: (time: 24) Coordinates: * time (time) datetime64[ns] 2022-09-28 ... 2022-09-28T23:00:00 spatial_ref int64 0 Data variables: CO (time) float32…
PierreL
  • 169
  • 9
0
votes
0 answers

Zarr slow read speed of 43.82 GB file with Xarray

I want to look-up 8760 times for a single lat/lon combo in less than a second from 43.82 GB file of wind data containing: 8760 times (every hour in a year) 721 latitudes (every 0.25° from -90.0° to 90.0°) 1440 longitude (every 0.25° from -180.0° to…
0
votes
1 answer

How to create Zarr array from a large stack of npy files?

I have a stack of 4 dimensional numpy arrays saved as .npy files. Each one is about 1.5 GB and I have 240 files, so about 360 GB total and much larger than memory. I want to combine them into a single Zarr array in a Google Cloud Storage bucket. My…
qsfzy
  • 554
  • 5
  • 17
0
votes
1 answer

Initialize larger-than-memory Xarray Dataset

I would like to initialize a very large XArray dataset (as on-disk Zarr if possible) for later processing - various parts (spatial subsets) of the dataset will be populated by a different script. This won't work, because the dataset obviously…
HyperCube
  • 3,870
  • 9
  • 41
  • 53
0
votes
1 answer

Add attributes to an existing Zarr storage

I have a zarr store that I open using xarray and zarr: report = xr.open_zarr(grid_file_name) where grid_file_name is pointing to a local zarr directory. I need to add some attribute to the store, and I can add them to the xarray object by: report =…
Shofus
  • 1
  • 1
0
votes
1 answer

Disallow mpi4py to interfere with internal handling of mpi by a python API

I am using mpiexec on a cluster to run large-scale simulations using pyNEST (mpiexec -n $N python simulate.py). I export a large number of small files which often tends to exceed my inode quota on the cluster. So, I am trying to reduce the number of…
Ady
  • 9
  • 2
0
votes
1 answer

Speed up git add for binary files - disable compression

I am working on a new data structure that is version friendly. So I have a git repository inside a Zarr file. I don't have any push or upload data. just local version control. currently, the git add . command is taking a lot of time. and git lfs is…
mzouink
  • 487
  • 1
  • 5
  • 13
0
votes
1 answer

Dask where returns NaN on valid array

I'm trying to accelerate my numpy code using dask. Following is a part of my numpy code arr_1 = np.load('.npy') arr_2 = np.load('.npy') arr_3 = np.load('.npy') arr_1 = np.concatenate((arr_1,…
F Baig
  • 339
  • 1
  • 4
  • 13
0
votes
1 answer

How to convert Zarr data to GeoTiff?

I want to load the HRRR forecast data into Google Earth Engine, so I think I need to convert it to GeoTiff. e.g. import xarray as xr import s3fs fs = s3fs.S3FileSystem(anon=True) urls =…
Adair
  • 1,697
  • 18
  • 22