2

I have several TB of image data, that are currently stored in many hdf-files with pytables, with one file for each frame. One file contains two groups, "LabelData" and "SensorData".

I have created a single (small) file that has all the file names and some meta data, and with the help of that file, I can call and open any needed hdf-data in a python generator.

This gives me a lot of flexibilty, however, it seems quite slow, as every single file has to be opened and closed.

Now I wanted to create a single hdf-file with external links to the other files, would that speed up the process?

As I have understood, creating external links requires to create a node for each link. However, I get the following performance warning:

PerformanceWarning: group / is exceeding the recommended maximum number of children (16384); be ready to see PyTables asking for lots of memory and possibly slow I/O. PerformanceWarning)

This is how I have created the file:

import tables as tb

def createLinkFile(linkfile,filenames, linknames):
    # Create a new file
    f1 = tb.open_file(linkfile, 'w')

    for filepath, linkname in zip(filenames,linknames):

        data = f1.create_group('/', linkname)

        # create an external link
        f1.create_external_link(data, 'LabelData', filepath + ':/LabelData')
        f1.create_external_link(data, 'SensorData', filepath + ':/SensorData')

    f1.close()

Is there a better way?

Madera
  • 21
  • 3

0 Answers0