0

After transferring my hdf5 file to Amazon EC2 linux instance it seems that I cannot see datasets in that file (5GB, md5sum checked after transfer)

When I run that code:

import h5py
h5_fname = 'DATA\DATA.h5'
print (h5py.version.info)
f = h5py.File(h5_fname, 'r')
print(f)
for name in f:
    print(name)
    print(f[name].shape)
f.close() 

On my local computer I get (which is correct):

h5py    2.6.0
HDF5    1.8.15
Python  3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
sys.platform    win32
sys.maxsize     9223372036854775807
numpy   1.12.0

<HDF5 file "DATA.h5" (mode r)>
X_train
(1397, 1, 128, 128, 128)
y_train
(1397, 1)
i_train
(1397, 1)
X_test
(198, 1, 128, 128, 128)
y_test
(198, 1)
i_test
(198, 1)

When run on Amazon instance:

h5py    2.6.0
HDF5    1.8.17
Python  3.5.1 (default, Sep 13 2016, 18:48:37)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)]
sys.platform    linux
sys.maxsize     9223372036854775807
numpy   1.11.3

<HDF5 file "DATA\DATA.h5" (mode r)>

There are version differences but I don't think that's the problem here. Any suggestions ?

Edit: The code how I created hdf5 file maybe useful:

def create_h5(fname_):
    f = h5py.File(fname_, 'w', libver='latest')  
    dtype_ = h5py.special_dtype(vlen=bytes)   
    num_samples_train = 1397
    num_samples_test = 1595 - 1397       
    chunks_ = (1, 1, 128, 128, 128) #100MB
    chunks_2 = (1, 1)

    f.create_dataset('X_train', (num_samples_train, 1, 128, 128, 128), dtype=np.float32, maxshape=(None, None, None, 128, 128), chunks=chunks_, compression="gzip")
    f.create_dataset('y_train', (num_samples_train, 1), dtype=np.int32, maxshape=(None, 1), chunks=chunks_2, compression="gzip")
    f.create_dataset('i_train', (num_samples_train, 1), dtype=dtype_, maxshape=(None, 1), chunks=chunks_2, compression="gzip")


    f.create_dataset('X_test', (num_samples_test, 1, 128, 128, 128), dtype=np.float32, maxshape=(None, None, None, 128, 128), chunks=chunks_, compression="gzip")
    f.create_dataset('y_test', (num_samples_test, 1), dtype=np.int32, maxshape=(None, 1), chunks=chunks_2, compression="gzip")
    f.create_dataset('i_test', (num_samples_test, 1), dtype=dtype_, maxshape=(None, 1), chunks=chunks_2, compression="gzip")


    f.flush()
    f.close()
    print('HDF5 file created')
Guillaume Jacquenot
  • 11,217
  • 6
  • 43
  • 49
klubow
  • 126
  • 7

1 Answers1

0

Changing h5_fname = 'DATA\DATA.h5' to h5_fname = 'DATA//DATA.h5' solved the problem.

However, it is very strange because even with first option I was able to open the file.

Guillaume Jacquenot
  • 11,217
  • 6
  • 43
  • 49
klubow
  • 126
  • 7