Reading large HDF using with Fortran

Question

I have some HDF data that has been created with PyTables. This data is very large, an array of 3973850000 x 8 double precision values, but with PyTables compression this can easily be stored.

I want to access this data using Fortran. I do,

PROGRAM HDF_READ
USE HDF
IMPLICIT NONE

CHARACTER(LEN=100), PARAMETER :: filename = 'example.h5'
CHARACTER(LEN=100), PARAMETER :: dsetname = 'example_dset.h5'
INTEGER error
INTEGER(HID_T) :: file_id
INTEGER(HID_T) :: dset_id
INTEGER(HID_T) :: space_id
INTEGER(HSIZE_T), DIMENSION(2) :: data_dims, max_dims
DOUBLE PRECISION, DIMENSION(:,:), ALLOCATABLE :: dset_data


!Initialize Fortran interface
CALL h5open_f(error)

!Open an existing file
CALL h5open_f(filename, H5F_ACC_RDONLY_F, file_id,error)
END PROGRAM HDF_READ

!Open a dataset
CALL h5dopen_f(file_id, dsetname, dset_id, error)

!Get dataspace ID
CALL h5dget_space_f(dset_id, space_id, error)

!Get dataspace dims
CALL h5sget_simple_extent_dims_f(space_id, data_dims,max_dims, error)

!Create array to read into
ALLOCATE(dset_data(data_dims(1), data_dims(2)))


!Get the data
CALL h5dread_f(dset_id, H5T_NATIVE_DOUBLE, dset_data, data_dims,error)

However, this creates an obvious problem, in that the array cannot be allocated to such a large size with double precision floats as it becomes greater than the system memory.

What is the best method for accessing this data? My current thoughts are for some sort of chunking method? Or is there a way to store the array on disk? Does HDF have methods for dealing with large data like this - I have read around but can find nothing pertaining to my case.

What do you need to do with your data? HDF5 is built for accessing chunks of data at a time, but if you plan to use all the information concurrently... — Ross, Aug 04 '17 at 17:14
For each row in the dataset I need to inspect it and see if it satisfies some mathematical criteria, so chunking is fine — user1887919, Aug 05 '17 at 09:51
You need to use the routine `h5sselect_hyperslab_f` https://support.hdfgroup.org/HDF5/doc/RM/RM_H5S.html#Dataspace-SelectHyperslab I suggest to load data by blocks that are multiples of the chunksize if available, else it is up to you what amount of data to load at a time. You can check the storage chunking with the command-line tool h5dump as `h5dump -A -p myfile.h5` (with optionally `-g /mygroup` or `-d /my/dataset` to focus on the group/dataset of interest) — Pierre de Buyl, Aug 07 '17 at 10:54

Reading large HDF using with Fortran

0 Answers0