I have an H5 file, named file.h5
, which stores an infrared image. This file is of size 282KB:
$ ls -l -sh file.h5
688 -rw-r--r-- 1 user staff 282K Feb 2 00:25 file.h5
First, I load the file in python using the library h5py
.
>> import h5py
>> hf = h5py.File('file.h5', 'r')
>> data = hf['infrared'][:]
Then, I store this same data (retrieved from a 282 KB H5 file) as a new H5 file.
>> hf2 = h5py.File('file2.h5', 'w')
>> hf2.create_dataset('infrared', data=data)
Since no processing on the data nor new fields have been added to the H5 file, I would expect the exact same size. However, to my surprise, I end up with a new H5 file of 2 MB size!!
$ ls -l -sh file2.h5
4104 -rw-r--r-- 1 user staff 2.0M Feb 2 00:39 file2.h5
EDIT: Having a closer look
In the following, I list some of the parameters suggested in the comments for each of the datasets (from old and new files).
Dataset in old file (hf)
>> hf['infrared']
<HDF5 dataset "infrared": shape (512, 512), type "<f8">
>> hf['infrared'].size
262144
>> hf['infrared'].shape
(512, 512)
>> hf['infrared'].dtype
dtype('float64')
>> hf['infrared'].chunks
(256, 256)
>> hf['infrared'].compression
'gzip'
>> hf['infrared'].shuffle
False
Dataset in new file (hf2)
>> hf2['infrared']
<HDF5 dataset "infrared": shape (512, 512), type "<f8">
>> hf2['infrared'].size
262144
>> hf2['infrared'].shape
(512, 512)
>> hf2['infrared'].dtype
dtype('float64')
>> hf2['infrared'].chunks
>> hf2['infrared'].compression
>> hf2['infrared'].shuffle
False