0

I have a dataset and wish to append it to an existing zarr and I must handle that one dimension in the dataset, 'range', is of different size than in the zarr. I'm hoping to just pad data with zeros or nan.

The problem

Here is what happens when I try to append some new data to my zarr:

>>> z = zarr.open('my.zarr', 'r+')
>>> ds = ... # created from other files
>>> ds
<xarray.Dataset>
Dimensions:            (range: 12571, time: 70)
Coordinates:
  * time               (ping_time) datetime64[ns] 2022-05-05T12:50:43.625000 ...
  * range              (range) float64 0.0 1.0 2.0 ... 1.267e+04 1.267e+04
  ...
Data variables:
    mydata             (time, range) float64 7.379...
    ...
>>> z.range.size
12669
>>> ds.to_zarr('my.zarr', append_dim='time') # Problem!!
...
ValueError: variable 'mydata' already exists with different dimension sizes: {'range': 12669} != {'range': 12571}. to_zarr() only supports changing dimension sizes when explicitly appending, but append_dim='time'

From this it seems like it might work if only I can change the 'range' dimension so that of the existing zarr:

>>> ds.range.size
12571
>>> 

to

>>> ds.range.size
12669
>>> 

How can I change the size of the coordinates of an existing dimension in the Dataset? Is there another way to append this data to my zarr?

What I've tried:

After looking around on how to append my 'range' dimension I've tried to assign new coordinates, by creating a new dimension and renaming back and forth, but it does not seem to work:

>>>ds = ds.rename({"range":"old_range"})
>>>ds = ds.expand_dims(dim={"range" : z.range.size})
>>>res = np.append(ds.old_range.data, z.range[ds.old_range.data.size : z.range.size])
>>>ds = ds.assign_coords({"range": res})
>>>ds
<xarray.Dataset>
Dimensions:            (range: 12669, time: 70,
                        old_range: 12571)
Coordinates:
  * time          (ping_time) datetime64[ns] 2022-05-05T12:50:43.625000 ...
  * old_range          (old_range) float64 0.0 0.007955 0.01591 ... 99.98 99.99
  * range              (range) float64 0.0 0.007955 0.01591 ... 99.98 99.99
Data variables:
    mydata                 (range, time, old_range) float64 7.379...

So it seems like ´mydata´ now has both dimensions. And it seems like 'range' dimension has been added to all data variables. How can I fix/avoid this?

By the way, for the revese case where ds.range.size > z.range.size it seems to work to just do z.range.resize(ds.range.size)

vindo
  • 1
  • 1

1 Answers1

0

After some more googling I found something that seems to work. The xarray 'pad' function works to resize the 'range' dimension like this:

amountToPad = z.range.size - ds.range.size
ds = ds.pad(range=(0,amountToPad))
ds.to_zarr('my.zarr', append_dim='time')
vindo
  • 1
  • 1