Running into a very strange issue when trying to create a rather large numpy ndarray dataset.
e.g.
import h5py
import numpy as np
test_h5=h5py.File('test.hdf5','w')
n=3055693983 # fail
n=10000000000 # works
n=40000000000 # fail
n=100000000000 # works
n=20000000000 #fail
n=512 # works
test_h5.create_dataset('matrix', shape=(n,n), dtype=np.int8, compression='gzip', chunks=(256,256))
print(test_h5['matrix'].shape)
a=test_h5['matrix']
a[0:256,0:256]=np.ones((256,256))
Chunk size is (256,256).
If the above ndarray is set to (512,512), everything works AOK.
If the above ndarray is set to (100000000000,100000000000), everything works AOK...
Ideally I wanted a ndarray of size (3055693983,3055693983) which fails with the following:
(3055693983, 3055693983) Traceback (most recent call last): File "h5.py", line 16, in a[0:256,0:256]=np.ones((256,256)) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2696) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2654) File "/home/user/anaconda2/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 618, in __setitem__ self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2696) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2654) File "h5py/h5d.pyx", line 221, in h5py.h5d.DatasetID.write (/home/ilan/minonda/conda-bld/work/h5py/h5d.c:3527) File "h5py/_proxy.pyx", line 132, in h5py._proxy.dset_rw (/home/ilan/minonda/conda-bld/work/h5py/_proxy.c:1889) File "h5py/_proxy.pyx", line 93, in h5py._proxy.H5PY_H5Dwrite (/home/ilan/minonda/conda-bld/work/h5py/_proxy.c:1599) IOError: Can't prepare for writing data (Can't retrieve number of elements in file dataset)
Setting the ndarray to a few random sizes produced mixed results. Some work, some do not... I thought it may be something simple like the ndarray size not being evenly divisible by the chunk_size, but that does not appear to be the issue.
What am I missing here?