3

I want to manipulate one of the old items of h5py dataset, then delete the old one and add the new one.

I use __delitem__() function to delete the old dataset item. It seems successfully delete that item from the keys of f5py file. But the file size doubles. Can any one give advice to actually delete items of h5py dataset? Thanks a lot.

This is my code:

import numpy as np
import h5py

# suppose I have hdf5 file names stored in: h5_files

for name in h5_files:
    roll_images = []
    with h5py.File(name, "a") as f:
        x = f["x_data"]
        np_x = np.array(x)

        # do something to np_x, but keep dtype and shape the same as x.

        f.__delitem__("x_data")
        f.create_dataset("x_data", data = np_x)

The size of original h5py file is: 997.3MB. But the after running the above code, file size is about double: 2.0GB

Dong Li
  • 520
  • 2
  • 7
  • 18
  • 2
    A similar question was asked here: http://stackoverflow.com/questions/1124994/removing-data-from-a-hdf5-file. You can use the "repack" tool to recover space in the file. – John Readey Sep 12 '16 at 15:46

1 Answers1

2

I might be wrong but I think that dataset deletion actually only removes name of the dataset but data still remains in the file. That would explain doubling of the file size.

If you really need to "delete" a dataset, copy all but the dataset to a new hdf5 file. I remember that this was the only work-around I was able to find in order to achieve the same thing.

Note: instead of f.__delitem__("x_data") you can use del f["x_data"].

ziky
  • 864
  • 8
  • 13
  • Yes, if I copy all but the dataset to a new h5py file, everything goes okay. The reason must be that I only delete name of the dataset, not the actual data. I try to use `del f["x_data"]`, this problem occurs as well. – Dong Li Sep 12 '16 at 13:56
  • Yes, that is what I said. Usage of ``__delitem__`` and ``del`` is the same thing, it was only suggestion to use ``del``, it looks better but that's all. I really think that there is no way how you could deleta data from hdf5 file. – ziky Sep 12 '16 at 14:32