8

I'm trying to overwrite the pandas dataframe in hdf5 file. Each time I do this, the file size grows up while the stored frame content is the same. If I use mode='w' I lost all other records. Is this a bug or am I missing something?

import pandas
df = pandas.read_csv('1.csv')
for i in range(100):
  store = pandas.HDFStore('tmp.h5')
  store.put('TMP', df)
  store.close()

The tmp.h5 grows in size.

Ninjakannon
  • 3,751
  • 7
  • 53
  • 76

1 Answers1

6

Read the big warning at the bottom of this section

This is how HDF5 works.

Jeff
  • 125,376
  • 21
  • 220
  • 187
  • Thank you very much! Each time I close the store, I run h5repack tool and this solves the issue. The size doesn't grow. – Sergey Sergienko Oct 13 '15 at 18:27
  • Not so intuitive, maybe it offers an undelete option, that's why it keeps growing. I think you have to use a `subprocess` call from Python to shrink it down again as per this answer [here](https://stackoverflow.com/a/21090432/4288043) – cardamom Mar 13 '18 at 11:09