Say I create a dataset with an integer variable.
import xarray as xr
import numpy as np
int_var = np.random.randint(0, 10, 10)
ds = xr.Dataset(data_vars={"int_var": (("x"), int_var)},
coords={"x": range(10)})
Then I save it, providing an encoding and an integer fill value:
from numcodecs import Blosc
compressor = Blosc(cname = 'lz4')
encoding = {v: {'compressor': compressor, 'dtype': ds[v].dtype, "_FillValue": -9999}
for v in ds.data_vars}
ds.to_zarr(store="example.zarr", mode='w', consolidated=True, encoding=encoding)
When I then read the data, the type has changed from int32 to float64. However, the type is still set as <i8
in the .zmetadata file, and I see that the _FillValue
is correctly being loaded as an int.
# Loads int_var with dtype float64
reloaded = xr.open_zarr("example.zarr", consolidated=True)
I need it to be an integer type since I'm storing indices and my job is to make the data easy to use––it's not acceptable for users to have to change the dtype for every integer column every time they need it.
I noticed that if I just delete _FillValue
from the encoding dict, the type is maintained. What's going on and how do I fix it?