15

Pandas has a nice interface that facilitates storing things like Dataframes and Series in an HDF5:

random_matrix  = np.random.random_integers(0,10, m_size)
my_dataframe =  pd.DataFrame(random_matrix)

store = pd.HDFStore('some_file.h5',complevel=9, complib='bzip2')
store['my_dataframe'] = my_dataframe
store.close()

But if I try to save some other regular Python objects in the same file, it complains:

my_dictionary = dict()
my_dictionary['a'] = 2           # <--- ERROR
my_dictionary['b'] = [2,3,4]

store['my_dictionary'] = my_dictionary
store.close()

with

TypeError: cannot properly create the storer for: [_TYPE_MAP] [group->/par
ameters (Group) u'',value-><type 'dict'>,table->None,append->False,kwargs-
>{}]                                   

How can I store regular Python data structures in the same HDF5 where I store other Pandas objects ?

Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564

1 Answers1

15

Here's the example from the cookbook: http://pandas.pydata.org/pandas-docs/stable/cookbook.html#hdfstore

You can store arbitrary objects as the attributes of a node. I belive there is a 64kb limit (I think its total attribute data for that node). The objects are pickled

In [1]: df = DataFrame(np.random.randn(8,3))

In [2]: store = HDFStore('test.h5')

In [3]: store['df'] = df

# you can store an arbitrary python object via pickle
In [4]: store.get_storer('df').attrs.my_attribute = dict(A = 10)

In [5]: store.get_storer('df').attrs.my_attribute
{'A': 10}
lib
  • 2,918
  • 3
  • 27
  • 53
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • Thanks! By the way, I get `PerformanceWarnings` with some `store` commands. I tried disabling them with: `import warnings; warnings.simplefilter(action="ignore", category = PerformanceWarning)` but I get `Name 'PerformanceWarning' is not defined`. Do you know how to mute them? – Amelio Vazquez-Reina Jul 23 '13 at 20:23
  • 1
    Actually you should pay attention to these. These basically are saying you are storing a data type that PyTables is going to ``pickle``! Try storing as a table (either use ``append`` or ``store.put('df',df,table=True)`` which stores in the ``Table `` format; better handling of things like ``nan`` certain dtypes (that the ``Storer`` format will give you a PerfWarning. See http://pandas.pydata.org/pandas-docs/dev/io.html#table-format – Jeff Jul 23 '13 at 20:42
  • 1
    If you really want to mute them, try: ``from pandas.io.pytables import PerformanceWarning``. but see my comment above. This is there for a reason – Jeff Jul 23 '13 at 20:52
  • Thank you Jeff. I tried with `store.put('my_dictionary', my_dictionary, table=True)` but I still get the error that I reported in my OP. – Amelio Vazquez-Reina Jul 23 '13 at 20:53
  • is ``my_dictionary`` a pandas object? (if it is, then first do a ``store.remove('my_dictionary')`` if its not a pandas object then you shouldu use the attribute method above. Tables try to ``append`` (while ``put`` always overwrites) – Jeff Jul 23 '13 at 20:58