0

I am attempting to set up a system to organize data contained across a large number of HDF5 files. From what I've read, it seems that the easiest way to do this that would suit my needs would be to create an SQLite database containing the paths to the files along with their top level metadata.

Since I don't have much experience with databases, I'm wondering what the best practices for maintaining such a database (which will be used by about five other people) would be. Should I simply write a script that would be run each time a file is added to the database, copying HDF5 metadata into an SQLite table? Any advice would be greatly appreciated.

I'm also wondering if in general this how HDF5 files are used, or whether it would be common to put all of one's data in a single HDF5 file, taking the place of a database.

  • You could do that. However I would suggest you look at _external links_ in HDF5, [API](http://www.hdfgroup.org/HDF5/doc/RM/RM_H5L.html#Link-CreateExternal), [code example](http://www.hdfgroup.org/ftp/HDF5/current/src/unpacked/examples/h5_extlink.c). This way there is no duplication of metadata and you can write a simple top-level program to update the links in the top-level hdf5 file. Then all data access can be done through the top-level file. – Timothy Brown Mar 25 '14 at 20:57
  • So in other words, just have a HDF5 file which links to the root group in all the other files? That's a good idea. Thanks! – Hinrik Ingolfsson Mar 27 '14 at 20:57
  • Yes. It'll keep everything in HDF5 (no need to add SQL code, etc). Then you can access the datasets from the top-level file too. – Timothy Brown Mar 27 '14 at 21:22

0 Answers0