I was given a 20GB HDF5 file created using pandas, but unfortunately written in the fixed format (rather than table) with each column written as a separate key. This works nicely for quickly loading one feature, but it doesn't allow handy table-oriented procedures (e.g., statistical analysis or plotting).
Trying to load the file as a whole gives the following error:
ValueError: key must be provided when HDF5 file contains multiple datasets
.
f=pd.read_hdf('file_path')
ValueError Traceback (most recent call last)
384 for group_to_check in groups[1:]:
385 if not _is_metadata_of(group_to_check, candidate_only_group):
--> 386 raise ValueError('key must be provided when HDF5 file '
387 'contains multiple datasets.')
388 key = candidate_only_group._v_pathname
ValueError: key must be provided when HDF5 file contains multiple datasets.
Unfortunately 'key' doesn't accept a python list, so I can't simply load all at once. Is there a way to convert the h5 file from 'fixed' to 'table'? Or to load the file to a dataframe in one go? At the moment my solution is to load each column separately and append to an empty dataframe.