Is it advisable to use both "shuffle" and "fletcher32" filters along side gzip or lzf while using h5py to create a dataset?

Question

I am trying to use h5py to write datasets in HDF5 format. The create_dataset() method has options to choose type of compression and filters. I could not find any resource so far to understand if shuffle = True and fletcher32 = True can be used together with compression = 'lzf' or 'gzip'.

f = h5py.open("my_hdf_file.h5", "w")
dset = f.create_dataset("zipped_dataset", shape=(778, 181, 128, 128), 
                                          chunks = True, 
                                          compression="gzip", 
                                          compression_opts=9, 
                                          shuffle = True)
f.close()

I know that the code above is okay and there are books and web-sources which show examples of similar type as well. But I could not find any discussion on using shuffle + fletcher32 + gzip/lzf.

I would like to understand the benefit of using both shuffle and fletcher32 simultaneously (if that's at all possible/advisable). If anyone could explain why this should or should not be done it will be very helpful.

Resources:

List of all available filters: https://portal.hdfgroup.org/display/support/Filters

CypherX, you have certainly done your research! I inherit HDF5 files created by another application, so optimization hasn't been an issue (yet), and I mostly use Pytables. Pytables has an interesting discussion here: [Optimization tips](https://www.pytables.org/usersguide/optimization.html). The **HDF Group** has 2 blogs that might help: [Performance Tuning](https://www.hdfgroup.org/2017/05/hdf5-data-compression-demystified-2-performance-tuning/) It has links several additional references. Good luck. — kcw78, May 25 '19 at 16:38
@kcw78: Thank you. Those links helped somewhat. I would need to make a deep dive though. But the good thing is, so far I used to think that PyTables can only be used for tabular data. Thanks to your post, I checked and learned that [**PyTables could be used for working with multidimensional arrays**](https://stackoverflow.com/questions/8843062/python-how-to-store-a-numpy-multidimensional-array-in-pytables) as well. — CypherX, May 27 '19 at 00:44
Yes Pytables supports multidimensional arrays. The info in that thread is "a little dated". Use an EArray (Extendable Array) if you need to add data to the array after initial creation .New rows can be added to the end of an enlargeable array by using the `EArray.append()` method. — kcw78, May 27 '19 at 16:02

Is it advisable to use both "shuffle" and "fletcher32" filters along side gzip or lzf while using h5py to create a dataset?

0 Answers0