0

Is it possible to select rows from a table in PyTables and apply a Numexpr-like expression to the output? For example, if I have the table

import tables as tb

class Event(tb.IsDescription):
    x = tb.Float32Col()
    y = tb.Float32Col()
    z = tb.Float32Col()

I would like the array of "x+y" where "z > 10.0".

DSM
  • 342,061
  • 65
  • 592
  • 494
xvtk
  • 1,030
  • 2
  • 10
  • 18

2 Answers2

1

Yes it is very possible. PyTables provided the Expr class to do exactly this. It uses numexpr under the covers.

Anthony Scopatz
  • 3,265
  • 2
  • 15
  • 14
  • How? Normally, I would have to do something like this `values = [x['x'] + x['y'] for x in table.where('(z > 10.0)')]`. How could Numexpr be used instead of `x['x'] + x['y']`? – xvtk Feb 20 '14 at 13:58
0

I think what you are attempting is closely related to what I was trying to find out here. I think it is not easily possible to combine the numexpr calculation and the in-kernel query. A possibility would be to use the where expression of numepxr on the whole table and set the result of the cases not satisfying your condition to 0, or something which would not occur during your calculation and then use a where query with a condition filtering out the 0s.

However, this will yield a massive overhead if you have a high selectivity criterium(low number of hits). If this is the case though and the data fits into memory, you can use Numexpr on the arrays loaded into memory (using the in-kernel query).

Community
  • 1
  • 1
Ben K.
  • 1,160
  • 6
  • 20