5

I'm using HDFStore with pandas / pytables.

After removing a table or object, hdf5 file size remains unaffected. It seems this space is reused afterwards when additional objects are added to store, but it can be an issue if large space is wasted.

I have not found any command in pandas nor pytables APIs that might be used to recover hdf5 memory.

Do you know of any mechanism to improve data management in hdf5 files?

jruizaranguren
  • 12,679
  • 7
  • 55
  • 73

1 Answers1

11

see here

you need to ptrepack it, which rewrites the file.

ptrepack --chunkshape=auto --propindexes --complevel=9 --complib=blosc in.h5 out.h5

as an example (this will also compress the file).

Jeff
  • 125,376
  • 21
  • 220
  • 187
  • 1
    is there a way to call ptrepack from a pytables or pandas API? – derchambers Dec 14 '16 at 05:32
  • 1
    @user3645626, not that I could find. I did subprocess.call to issue the `ptrepack` utility though: call(["ptrepack", "-o", "--chunkshape=auto", "--propindexes", --complevel=9", "--complib=blosc",infilename, outfilename]). I'd be interested to hear if there is a another way. – 0_0 Dec 21 '16 at 08:51