2

Recently I have started working with .hdf5 files and still can't figure out how to properly use external links.

I have got a few .hdf5 files. Each file has got the same structure e.g. same keys and data types. I want to merge them into one file but keep them separate with a different key for each file.

Here's what I do:

myfile = h5py.File("/path_to_the_directory/merged_images.hdf5", 'w')
myfile['0.000'] = h5py.ExternalLink("img_000.hdf5", "/path_to_the_directory/images")
myfile['0.001'] = h5py.ExternalLink("img_001.hdf5", "/path_to_the_directory/images")
myfile.flush()

Then I try to read it with:

myfile = h5py.File("/path_to_the_directory/merged_images.hdf5", 'r')
keys = list(myfile.keys())
print(keys)
print(list(myfile[keys[0]]))

The line print(keys) gives me ['0.000', '0.001']. So, I believe the file's structure is okay.

And the next lines gives me an exception: KeyError: "Unable to open object (unable to open external file, external link file name = 'img_000.hdf5')"

Am I doing something wrong? The documentation is pretty poor and I haven t found a relevant use-case there.

Denis
  • 719
  • 2
  • 8
  • 23

1 Answers1

1

The problem is that you are mixing up paths. It is important to distinguish between two types of paths:

  • File path (the location on your hard drive).
  • Dataset path: this path is internal to the HDF5-file, and does not depend on where you store the file.

The syntax of h5py.ExternalLink, as mentioned in the documentation, is:

myfile['/path/of/link'] = h5py.ExternalLink('/path/to/file.hdf5', '/path/to/dataset')

Thereby I would like to encourage you to use a relative file path for the ExternalLink. If you do that, then everything will continue to work even if you move the collection of files to a new location on your hard drive (or give them to somebody else).

With the correct paths, your example works, as shown below.

Note that, to illustrate my remark about relative file paths, I have made all paths of the datasets absolute (these are only internal to the file, and do not depend on where the file is stored on the hard drive) while I kept the file paths relative.

import h5py
import numpy as np

myfile = h5py.File('test_a.hdf5', 'w')
myfile['/path/to/data'] = np.array([0,1,2])
myfile.close()

myfile = h5py.File('test_b.hdf5', 'w')
myfile['/path/to/data'] = np.array([3,4,5])
myfile.close()

myfile = h5py.File('test.hdf5', 'w')
myfile['/a'] = h5py.ExternalLink('test_a.hdf5', '/path/to/data')
myfile['/b'] = h5py.ExternalLink('test_b.hdf5', '/path/to/data')
myfile.close()

myfile = h5py.File('test.hdf5', 'r')
keys = list(myfile.keys())
print(keys)
print(list(myfile[keys[0]]))
print(list(myfile[keys[1]]))
myfile.close()

Prints (as expected):

['a', 'b']
[0, 1, 2]
[3, 4, 5]
Tom de Geus
  • 5,625
  • 2
  • 33
  • 77