9

I've ran into a problem grouping with HDFStore which turned out to extend to selecting rows based on strings that contain the '&' character. This should illustrate the problem

>>> from pandas import HDFStore, DataFrame
>>> df = DataFrame({'a': ['a', 'a', 'c', 'b', 'test & test', 'c' , 'b', 'e'], 
                   'b': [1, 2, 3, 4, 5, 6, 7, 8]})
>>> store = HDFStore('test.h5')
>>> store.append('test', df, format='table', data_columns=True)
>>> df[df.a == 'test & test']
     a              b
4    test & test    5
>>> store.select('test', 'a="test & test"')
Int64Index([], dtype='int64')   Empty DataFrame

Now I'm wondering if I'm missing something from the documentation or if this is a bug.

jan zegan
  • 1,629
  • 1
  • 12
  • 18
  • 3
    bug.... see here: https://github.com/pydata/pandas/issues/6351; I don't think hard to fix, we have a pre-parser that basically subsitutes certain expressions; need to have it not do that inside quotes – Jeff Feb 14 '14 at 03:22
  • 1
    This was just merged in ...... so pls give a try with master! – Jeff Feb 14 '14 at 13:13

2 Answers2

1

As commented, this is now fixed (since pandas 0.14):

In [11]: df[df.a == 'test & test']
Out[11]:
             a  b
4  test & test  5

In [12]: store.select('test', 'a="test & test"')
Out[12]:
             a  b
4  test & test  5
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
-2

In my opinion h5py is a much more robust python module for HDF5 files than pandas. http://www.h5py.org/

AlienAnarchist
  • 186
  • 4
  • 15
  • 2
    The question is about how to use Pandas. This answer has nothing to do with the problem at hand. – tharen Dec 05 '14 at 02:03