0

I'll preface this question by noting that I'm happy to consider alternatives to pytables, but I would prefer to use pytables in order to benefit from the numexpr features.

I'm looking for a solution for storing/exploring/analyzing my data, for example of the following form: suppose I have many Event objects, representing some experimental measurement at a certain instant in time. Each Event contains some scalar fields, as well as a variable number of Particle objects, each of which contain some scalar fields of their own. See my "drawing" below.

My first thought was to have each Event as a row in a table. I understand that there is a VLArray type in pytables, but it seems that these can only store primitive data types. Is there some way to store this data with pytables?

I also considered having each Event be its own group, with a Particle table containing a variable number of rows. However, I anticipate many millions of Events, and I would like to be able to e.g. select events and plot certain fields, as one would do with rows in a table.

If it's not possible to accomplish this with pytables, what are some alternative solutions?

    +-------------------------+
  +-------------------------+ |
+--------- Event ---------+ | |
|  timestamp    (int)     | | |
|  temperature  (float)   | | |
|  latitude     (float)   | | |
|  longitude    (float)   | | |
|  ... [etc]...           | | |
|                         | | |
|  +-- Particle<1> --+    | | |
|  |  idx    (int)   |    | | |
|  |  energy (float) |    | | |
|  |  x      (float) |    | | |
|  |  y      (float) |    | | |
|  |  z      (float) |    | | |
|  |  ... [etc] ...  |    | | |
|  +-----------------+    | | |
|          ...            | | |
|  +-- Particle<N> --+    | | |
|  |  idx    (int)   |    | | |
|  |  energy (float) |    | | |
|  |  x      (float) |    | | |
|  |  y      (float) |    | | |
|  |  z      (float) |    | | |
|  |  ... [etc] ...  |    | | +
|  +-----------------+    | +
+-------------------------+
chase
  • 370
  • 3
  • 12

1 Answers1

0

You could do it database-style, with two tables. One for Events:

    +-------------------------+
  +-------------------------+ |
+--------- Event ---------+ | |
|  timestamp    (int)     | | |
|  temperature  (float)   | | |
|  latitude     (float)   | | +
|  longitude    (float)   | +
+-------------------------+

And one for Particles, with a "foreign key":

    +-------------------------+
  +-------------------------+ |
+-------- Particle -------+ | |
|  event        (?)       | | |
|  idx          (int)     | | |
|  energy       (float)   | | |
|  x            (float)   | | |
|  z            (float)   | | +
|  ... [etc] ...          | +
+-------------------------+

The type of Particle.event could be an int index into an Events table, or it could be matched to an id column added to Events, or it could even be an object type which is actually a pointer to an Event. They all have different trade-offs.

You could also "de-normalize" by copying the Event data into every Particle record. This may provide better performance for some use cases, at the expense of some redundancy in the stored data.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436