0

I can't seem to be able to query the simplest DataFrame in an HDFStore:

In [1]:
import pandas as pd
pd.__version__

Out[1]:
'0.15.1'

In [2]:
df = pd.DataFrame.from_dict({'A':[1,2],'B':[100,200], 'C':[42,11]})
df_a = df.set_index('A')
df_a

Out[2]: 
     B   C
A         
1  100  42
2  200  11

In [3]:
store = pd.HDFStore('foo.h5','w')
store.put('bar', df_a, format='table')
store.select('bar', where=["'A' == 1"])
---------------------------------------------------------------------------
ValueError: query term is not valid [[Condition : [None]]]

Querying without set_index yields the same error.

To my surprise querying a MultiIndexed DataFrame works fine:

In [4]:
df_ab = df.set_index(['A','B'])
df_ab

Out [4]:
        C
A B      
1 100  42
2 200  11

In [5]:
store.put('bar', df_ab, format='table')
store.select('bar', where=["'A' == 1"])

Out [5]:
        C
A B      
1 100  42
roldugin
  • 922
  • 5
  • 19

1 Answers1

1

you need to set the columns you want to query as data_columns, see docs

Further, the query itself should be a string (not a list):

In [1]: df = pd.DataFrame.from_dict({'A':[1,2],'B':[100,200], 'C':[42,11]})

In [2]: df_a = df.set_index('A')

In [3]: df_a
Out[3]: 
     B   C
A         
1  100  42
2  200  11

In [4]: store = pd.HDFStore('foo.h5','w')

In [5]: store.put('bar', df_a, format='table', data_columns=True)

You are querying the index, so say that. The name of the index ('A') is not supported ATM.

In [7]: store.select('bar','index==1')
Out[7]: 
     B   C
A         
1  100  42

data_columns can be specified in the query

In [8]: store.select('bar','B==100')
Out[8]: 
     B   C
A         
1  100  42
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • Here is an enhancement issue to add querying index by name: https://github.com/pydata/pandas/issues/9042 – Jeff Dec 08 '14 at 11:22
  • Thank you for the answer, it works! Let me clarify for anybody else looking that specifying query as a list is still legal syntax. Also in a `MultiIndex` scenario query by `index` uses absolute row numbers as opposed to the logical index. Querying `MultiIndex` by name ('A') will work in this case. – roldugin Dec 08 '14 at 23:32