4

I'm using PyTables to store some images as Array and CArray data types. For each of these images, I also want to store some basic metadata (e.g., EXIF data).

I can imagine a number of approaches to storing both of these data formats, ranging from storing the metadata with the AttributeSet class for each Array/CArray to using a Table for all of the metadata.

My question is: What is the best approach if I want to be able to efficiently query and extract images from the ultimate hdf5 file for processing? For example, I'd like to be able to extract images taken at certain times (12-3pm) and process that subset of the data and then insert copies into the database or replace the existing arrays.

Many thanks for the help.

Best,

Nick

[Edit (clarification): I'm currently processing these images as NumPy arrays, and I hope to preserve that functionality]

Nick
  • 655
  • 1
  • 5
  • 16
  • See here for tips and example: http://machinelearninguru.com/deep_learning/data_preparation/hdf5/hdf5.html – cxrodgers Oct 12 '18 at 18:59

1 Answers1

1

My understanding of the PyTables docs suggest the following.

Create a table. Create one column for each piece of metadata you are interested in. If your images are the same size, and this is known at the time of table creation, create a column of arrays and store them there. If the image sizes vary, create a column with an unique identifier for each image (the functional equivalent of a filename), then create a new group and create one array/carray per image, named identically to the list in the aforementioned table.

The other option would have been to use a lightweight RDMS (sqlite even) to store the table, which would allow for easy querying/sorting/etc., but keep the actual image arrays in the h5 file.

Flexo
  • 87,323
  • 22
  • 191
  • 272
troy.unrau
  • 1,142
  • 2
  • 12
  • 26