6

I was wondering if there was a way of easily, quickly, and without loading the entire file, getting the number of rows in an hdf5 file, created using pandas, with pandas?

Thank you in advance!

Cenoc
  • 11,172
  • 21
  • 58
  • 92
  • Did you try a simple `pandas.read_hdf()` followed by `len()` on the column you want? This sort of thing definitely works with `h5py`, but I'm not 100% sure of the reading behavior of PyTables. – John Zwinck Oct 20 '14 at 12:42

2 Answers2

20
In [1]: DataFrame(np.random.randn(10,10)).to_hdf('test.h5','df',mode='w',format='table')

In [3]: store = pd.HDFStore('test.h5')

In [4]: store
Out[4]: 
<class 'pandas.io.pytables.HDFStore'>
File path: test.h5
/df            frame_table  (typ->appendable,nrows->10,ncols->10,indexers->[index])

In [5]: store.get_storer('df').nrows
Out[5]: 10
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • I noticed that for a table in fixed format the above operation gives `None`. An easy work around is `store.get_storer('df').shape[0]` – JoeCondron Aug 13 '18 at 11:45
  • I did not get the `/df ...` info when visualizing `store` in the REPL, but I got it when doing `store.get_storer('df')`. (My dataframe is stored as a `table`, at the key `df`) – Michele Piccolini Jul 30 '20 at 09:49
0

For fixed tables, @jeff 's answer didn't give me the right number of rows, so I ended up getting last row's index and use it as the number of rows:

store = pd.HDFStore('test.h5')
len_df = store.select('df', start=-1).index[0] + 1

You should be extra confident that your dataframe's index is unique and gives you the row number.

saeedghadiri
  • 196
  • 1
  • 5