I have several TB of image data, that are currently stored in many hdf-files with pytables, with one file for each frame. One file contains two groups, "LabelData" and "SensorData".
I have created a single (small) file that has all the file names and some meta data, and with the help of that file, I can call and open any needed hdf-data in a python generator.
This gives me a lot of flexibilty, however, it seems quite slow, as every single file has to be opened and closed.
Now I wanted to create a single hdf-file with external links to the other files, would that speed up the process?
As I have understood, creating external links requires to create a node for each link. However, I get the following performance warning:
PerformanceWarning: group
/
is exceeding the recommended maximum number of children (16384); be ready to see PyTables asking for lots of memory and possibly slow I/O. PerformanceWarning)
This is how I have created the file:
import tables as tb
def createLinkFile(linkfile,filenames, linknames):
# Create a new file
f1 = tb.open_file(linkfile, 'w')
for filepath, linkname in zip(filenames,linknames):
data = f1.create_group('/', linkname)
# create an external link
f1.create_external_link(data, 'LabelData', filepath + ':/LabelData')
f1.create_external_link(data, 'SensorData', filepath + ':/SensorData')
f1.close()
Is there a better way?