I am trying to use h5py to write datasets in HDF5 format. The create_dataset()
method has options to choose type of compression and filters. I could not find any resource so far to understand if shuffle = True
and fletcher32 = True
can be used together with compression = 'lzf'
or 'gzip'
.
f = h5py.open("my_hdf_file.h5", "w")
dset = f.create_dataset("zipped_dataset", shape=(778, 181, 128, 128),
chunks = True,
compression="gzip",
compression_opts=9,
shuffle = True)
f.close()
I know that the code above is okay and there are books and web-sources which show examples of similar type as well. But I could not find any discussion on using shuffle + fletcher32 + gzip/lzf
.
I would like to understand the benefit of using both shuffle and fletcher32 simultaneously (if that's at all possible/advisable). If anyone could explain why this should or should not be done it will be very helpful.
Resources:
- http://docs.h5py.org/en/latest/high/dataset.html#dataset-compression
- http://docs.h5py.org/en/latest/high/group.html#Group.create_dataset
- Python and HDF5: Book by Andrew Colette - Filters and Compression
- This answer to this stackoverflow question
List of all available filters: https://portal.hdfgroup.org/display/support/Filters