1

I am trying to read in a hdf file but no groups show up. I have tried a couple different methods using tables and h5py but neither work in displaying the groups in the file. I checked and the file is 'Hierarchical Data Format (version 5) data' (See Update). The file information is here for a reference.

Example data can be found here

import h5py
import tables as tb

hdffile = "TRMM_LIS_SC.04.1_2010.260.73132"

Using h5py:

f = h5py.File(hdffile,'w')
print(f)

Outputs:

< HDF5 file "TRMM_LIS_SC.04.1_2010.260.73132" (mode r+) >
[]

Using tables:

fi=tb.openFile(hdffile,'r')
print(fi)

Outputs:

TRMM_LIS_SC.04.1_2010.260.73132 (File) ''
Last modif.: 'Wed Aug 10 18:41:44 2016'
Object Tree:
/ (RootGroup) ''

Closing remaining open files:TRMM_LIS_SC.04.1_2010.260.73132...done

UPDATE

h5py.File(hdffile,'w') overwrote the file and emptied it.

Now my question is how to read in a hdf version 4 file into python since h5py and tables both do not work?

BenT
  • 3,172
  • 3
  • 18
  • 38
  • What @MaxU says... And, this will also help you: https://docs.python.org/3/library/functions.html#open See the table, to read a file it is 'r', to write, 'w' to append 'a'. Good luck! – Kartik Aug 10 '16 at 20:01

3 Answers3

4

How big is the file? I think that doing h5py.File(hdffile,'w') overwrites it, so it's empty. Use h5py.File(hdffile,'r') to read.

I don't have enough karma to reply to @Luke H's answer, but reading it into pandas might not be a good idea. Pandas hdf5 uses pytables, which is an "opinionated" way of using hdf5. This means that it stores extra metadata (eg. the index). So I would only use pytables to read the file if it was made with pytables.

user357269
  • 1,835
  • 14
  • 40
  • Thanks! You are right that the 'w' emptied the file and caused the hdf (version 5) file because I re-downloaded the file and it is in version 4 now. Unfortunately now h5py.File will not work because the file signature is not found. – BenT Aug 10 '16 at 19:06
1

UPDATE:

i would recommend you first to convert your HDF version 4 files to HDF5 / h5 files as all modern libraries / modules are working with HDF version 5...

OLD answer:

try it this way:

store = pd.HDFStore(filename)
print(store)

this should print you details about the HDF file, including stored keys, lengths of stored DFs, etc.

Demo:

In [18]: fn = r'C:\Temp\a.h5'

In [19]: store = pd.HDFStore(fn)

In [20]: print(store)
<class 'pandas.io.pytables.HDFStore'>
File path: C:\Temp\a.h5
/df_dc               frame_table  (typ->appendable,nrows->10,ncols->3,indexers->[index],dc->[a,b,c])
/df_no_dc            frame_table  (typ->appendable,nrows->10,ncols->3,indexers->[index])

now you can read dataframes using keys from the output above:

In [21]: df = store.select('df_dc')

In [22]: df
Out[22]:
    a   b   c
0  92  80  86
1  27  49  62
2  55  64  60
3  31  66   3
4  37  75  81
5  49  69  87
6  59   0  87
7  69  91  39
8  93  75  31
9  21  15   7
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • So from @user357269 I found out that the file got overwrote and it is in hdf version 4 and pandas HDF tools only works with version 5. Thanks though. – BenT Aug 10 '16 at 19:23
  • I don't have hdf4 of the converter installed so I will try that. Thanks. – BenT Aug 10 '16 at 19:52
0

Try using pandas:

import pandas as pd
f = pd.read_hdf(C:/path/to/file)

See Pandas HDF documentation here.

This should read in any hdf file as a dataframe you can then manipulate.

Luke H
  • 87
  • 1
  • 7
  • I tried using the pd.read_hdf from pandas but it requires a second argument for a group identifier that I haven't been able to find. – BenT Aug 10 '16 at 19:02
  • That is because there is more than one "pandas object" in the file. You'll need to specify which one (VIA the "key" argument). I'm sorry I can't help you much more than that. – Luke H Aug 10 '16 at 19:05
  • So from @user357269 I found out that the file got overwrote and it is in hdf version 4 and pandas.read_hdf only works from what I gather with version 5. – BenT Aug 10 '16 at 19:19