How to work with HDF file (fixed format, multiple keys) as a pandas dataframe?

Question

I was given a 20GB HDF5 file created using pandas, but unfortunately written in the fixed format (rather than table) with each column written as a separate key. This works nicely for quickly loading one feature, but it doesn't allow handy table-oriented procedures (e.g., statistical analysis or plotting).

Trying to load the file as a whole gives the following error:

ValueError: key must be provided when HDF5 file contains multiple datasets.

f=pd.read_hdf('file_path')

ValueError                             Traceback (most recent call last)

384             for group_to_check in groups[1:]:
385                 if not _is_metadata_of(group_to_check, candidate_only_group):

--> 386                     raise ValueError('key must be provided when HDF5 file '
    387                                      'contains multiple datasets.')
    388             key = candidate_only_group._v_pathname

ValueError: key must be provided when HDF5 file contains multiple datasets.

Unfortunately 'key' doesn't accept a python list, so I can't simply load all at once. Is there a way to convert the h5 file from 'fixed' to 'table'? Or to load the file to a dataframe in one go? At the moment my solution is to load each column separately and append to an empty dataframe.

score 3 · Answer 1 · answered Jun 13 '20 at 19:03

I don't know any other way that loading the df column by column but you can greatly automate this using HDFStore instead of read_hdf:

with pd.HDFStore(filename) as h5:
    df = pd.concat(map(h5.get, h5.keys()), axis=1)

Example:

#save df as multiple datasets
df = pd.DataFrame({'a': [1,2], 'b': [10,20]})
df.a.to_hdf('/tmp/df.h5', 'a', mode='w', format='fixed')
df.b.to_hdf('/tmp/df.h5', 'b', mode='a', format='fixed')

#read columns and concat to dataframe    
with pd.HDFStore('/tmp/df.h5') as h5:
    df1 = pd.concat(map(h5.get, h5.keys()), axis=1)

#verify
assert all(df1 == df)

You may also find [my answer here](https://stackoverflow.com/a/62364856/3944322) helpful as I've seen you commented on the other answer there. — Stef, Jun 13 '20 at 19:50

How to work with HDF file (fixed format, multiple keys) as a pandas dataframe?

1 Answers1

Linked