3

There is an HDF file 'file.h5' and the key name of a pandas DataFrame (or a Series) saved into it is 'df'. How can one determine in what format (i.e. ‘fixed’ or ‘table’) was 'df' saved into the file?

Thank you for your help!

S.V
  • 2,149
  • 2
  • 18
  • 41

2 Answers2

2

A bit late but maybe someone else may find it helpful.

You can parse the output of HDFStore.info(). Objects in table format have the type appendable:

>>> print(h5_table.info())
<class 'pandas.io.pytables.HDFStore'>
File path: /tmp/df_table.h5
/df            frame_table  (typ->appendable,nrows->2,ncols->2,indexers->[index],dc->[])

>>> print(h5_fixed.info())
<class 'pandas.io.pytables.HDFStore'>
File path: /tmp/df_fixed.h5
/df            frame        (shape->[2,2]) 

This is a minimal (i.e. without error handling for missing file or key) example:

def get_hd5_format(path, key):
    with pd.HDFStore(path) as store:
        info = store.info()
    return 'table' if 'typ->appendable' in next(k for k in info.splitlines()[2:] if k.startswith('/'+key)).split()[2] else 'fixed'

Example usage:

>>> get_hd5_format('/tmp/df_table.h5', 'df')
'table'
>>> get_hd5_format('/tmp/df_fixed.h5', 'df')
'fixed'
Stef
  • 28,728
  • 2
  • 24
  • 52
0

By default the format used is "fixed" which allows fast read/write capabilities but is neither appendable nor searchable.

However, you can even explicitly specify the format in which you want to get it saved in the hdf5 file as below:

df.to_hdf('file.h5', key='df', mode='w', format='table')

Note - The above command is just a sample chosen to illustrate the use of format parameter. The values of the parameters can be kept as per your requirement.

For any further reference related to this, you can also visit the below pandas documentation page :

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html

Hope the above information helps.

  • 2
    Unfortunately, I does **not** help. The problem is to determine format of a DataFrame in a file **after** it has been saved. I have a large number of data files (some were saved in the 'fixed' format and some in the 'table' format), which I need to process and for each of them I need to know if I can use the additional functionality of the 'table' format when I process them. – S.V May 29 '18 at 13:19
  • 1
    Yes, not a helpful answer. Strangely, this question is asked online in many places and the responders always reply with some information about SAVING an HDF, not READING an HDF... – Shawn Jun 09 '20 at 14:35