3

in my continuing spree of exotic pandas/HDF5 issues, I encountered the following:

I have a series of non-natural named columns (nb: because of a good reason, with negative numbers being "system" ids etc), which normally doesn't give an issue:

fact_hdf.select('store_0_0', columns=['o', 'a-6', 'm-13'])

however, my select statement does fall over it:

>>> fact_hdf.select('store_0_0', columns=['o', 'a-6', 'm-13'], where=[('a-6', '=', [0, 25, 28])])
blablabla
File "/srv/www/li/venv/local/lib/python2.7/site-packages/tables/table.py", line 1251, in _required_expr_vars
    raise NameError("name ``%s`` is not defined" % var)
NameError: name ``a`` is not defined

Is there any way to work around it? I could rename my negative value from "a-1" to a "a_1" but that means reloading all of the data in my system. Which is rather much! :)

Suggestions are very welcome!

Carst
  • 1,614
  • 3
  • 17
  • 28

1 Answers1

3

Here's a test table

In [1]: df = DataFrame({ 'a-6' : [1,2,3,np.nan] })

In [2]: df
Out[2]: 
   a-6
0    1
1    2
2    3
3  NaN

In [3]: df.to_hdf('test.h5','df',mode='w',table=True)

 In [5]: df.to_hdf('test.h5','df',mode='w',table=True,data_columns=True)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6_kind'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6_dtype'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)

There is a very way, but would to build this into the code itself. You can do a variable substitution on the column names as follows. Here is the existing routine (in master)

   def select(self):
        """
        generate the selection
        """
        if self.condition is not None:
            return self.table.table.readWhere(self.condition.format(), start=self.start, stop=self.stop)
        elif self.coordinates is not None:
            return self.table.table.readCoordinates(self.coordinates)
        return self.table.table.read(start=self.start, stop=self.stop)

If instead you do this

(Pdb) self.table.table.readWhere("(x>2.0)",
      condvars={ 'x' : getattr(self.table.table.cols,'a-6')})
array([(2, 3.0)], 
      dtype=[('index', '<i8'), ('a-6', '<f8')])

e.g. by subsituting x with the column reference, you can get the data.

This could be done on detection of invalid column names, but is pretty tricky.

Unfortunately I would suggest renaming your columns.

Jeff
  • 125,376
  • 21
  • 220
  • 187
  • Once again, thanks Jeff! I wonder if it changed between 0.10/0.11 and 0.12, but I guess with the imminent 0.13 new where clausule it doesn't matter anyway :) It might be a nice idea to put it into the pandas HDF5 documentation though as a warning for new users? As pandas itself doesn't really have issues with it and normal data retrieval works, it's not something that's very obvious until you run into select statement issues... – Carst Oct 09 '13 at 16:48
  • yep...I will do that. Pandas version doesn't matter much (though in 0.13, allowed syntax is much more flexible for where's so its actually a big tricky to figure out), see [here](http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#hdfstore-api-changes) – Jeff Oct 09 '13 at 17:19