9

I have an HDF5 file that contains a 2D table with column names. It shows up as such in HDFView when I loot at this object, called results.

It turns out that results is a "compound Dataset", a one-dimensional array where each element is a row. Here are its properties as displayed by HDFView:

Dataset properties

I can get a handle of this object, let's call it res.

The column names are V2pt, R2pt, etc.

I can read the entire array as data, and I can read one element with

res[0,...,"V2pt"].

This will return the number in the first row of column V2pt. Replacing 0 with 1 will return the second row value, etc.

That works if I know the colunm name a priori. But I don't.

I simply want to get the whole Dataset and its column names. How can I do that?

I see that there is a get_field_info function in the HDF5 documentation in the HDF5 documentation, but I find not such function in h5py.

Am I screwed?

Even better would be a solution to read this table as a pandas DataFrame...

Community
  • 1
  • 1
germ
  • 1,477
  • 1
  • 18
  • 18

1 Answers1

14

This is pretty easy to do in h5py and works just like compound types in Numpy. If res is a handle to your dataset, res.dtype.fields.keys() will return a list of all the field names.

If you need to know a specific dtype, something like res.dtype.fields['V2pt'] will give it.

John Readey
  • 531
  • 3
  • 6
  • John, thanks for your answer. However, I have two follow-up questions. – germ May 28 '16 at 06:18
  • 1
    1. The list returned is not in the same order as the table. I guess it means I have to iterate through the list and get every column instead of res[...]. 2. I have another table where your method gives only two columns, let's say the first one is 'minor results'. In reality, the table has many more columns, which show up in HDF5View as 'minor results->up->Param1'. These seem to refer to some other table. Any idea on how to get those??? – germ May 28 '16 at 06:24
  • 1
    dtype.fields returns a dictionary object, which messes up the ordering. You can do: dtype.names which will return an ordered tuple of field names. I'm not what's going on with the missing columns. Is it a compound type of compound types? In that case you'd need some code to get a flat list of all the field names. – John Readey May 29 '16 at 19:09