Is it possible to read a given set of rows from an hdf5 file without loading the whole file? I have quite big hdf5 files with loads of datasets, here is an example of what I had in mind to reduce time and memory usage:
#! /usr/bin/env python
import numpy as np
import h5py
infile = 'field1.87.hdf5'
f = h5py.File(infile,'r')
group = f['Data']
mdisk = group['mdisk'].value
val = 2.*pow(10.,10.)
ind = np.where(mdisk>val)[0]
m = group['mcold'][ind]
print m
ind
doesn't give consecutive rows but rather scattered ones.
The above code fails, but it follows the standard way of slicing an hdf5 dataset. The error message I get is:
Traceback (most recent call last):
File "./read_rows.py", line 17, in <module>
m = group['mcold'][ind]
File "/cosma/local/Python/2.7.3/lib/python2.7/site-packages/h5py-2.3.1-py2.7-linux-x86_64.egg/h5py/_hl/dataset.py", line 425, in __getitem__
selection = sel.select(self.shape, args, dsid=self.id)
File "/cosma/local/Python/2.7.3/lib/python2.7/site-packages/h5py-2.3.1-py2.7-linux-x86_64.egg/h5py/_hl/selections.py", line 71, in select
sel[arg]
File "/cosma/local/Python/2.7.3/lib/python2.7/site-packages/h5py-2.3.1-py2.7-linux-x86_64.egg/h5py/_hl/selections.py", line 209, in __getitem__
raise TypeError("PointSelection __getitem__ only works with bool arrays")
TypeError: PointSelection __getitem__ only works with bool arrays