4

I would like to programatically change the data associated with a dataset in an HDF5 file. I can't seem to find a way to either delete a dataset by name (allowing me to add it again with the modified data) or update a dataset by name. I'm using the C API for HDF5 1.6.x but pointers towards any HDF5 API would be useful.

Barry Wark
  • 107,306
  • 24
  • 181
  • 206

2 Answers2

8

According to the user guide:

HDF5 does not at this time provide an easy mechanism to remove a dataset from a file or to reclaim the storage space occupied by a deleted object.

So simple deletion appears to be out of the question. But the section continues:

Removing a dataset and reclaiming the space it used can be done with the H5Ldelete function and the h5repack utility program. With the H5Ldelete function, links to a dataset can be removed from the file structure. After all the links have been removed, the dataset becomes inaccessible to any application and is effectively removed from the file. The way to recover the space occupied by an unlinked dataset is to write all of the objects of the file into a new file. Any unlinked object is inaccessible to the application and will not be included in the new file. Writing objects to a new file can be done with a custom program or with the h5repack utility program.

Max Lybbert
  • 19,717
  • 4
  • 46
  • 69
  • Thanks. Any idea how PyTables (a python engine built on top of HDF5) handles this? – Barry Wark Jan 16 '09 at 22:01
  • The documentation for "altering" a table in PyTables is at http://www.pytables.org/moin/HintsForSQLUsers#Alteringatable , but note "(adding a column) is currently not supported in PyTables." – Max Lybbert Jan 20 '09 at 21:45
  • 1
    It's strange to get an anonymous downvote five years after answering the question, especially since my answer links to the relevant documentation that clearly stated this was impossible in 2009. Has this ability been added? – Max Lybbert Apr 22 '14 at 20:10
  • @MaxLybbert please, can you tell me another way to delete all the values from a dataset and resize dataset as per new values. – Mohini Mhetre Oct 15 '14 at 05:41
  • 1
    @MohiniMhetre: I was looking at HDF5 for something I was monkeying with five years ago, but I never got serious about that project. I'm certainly not an HDF5 expert. As far as I remember, it is possible to delete/update data; but the file won't shrink in size even if you remove values. It seems to be more common to simply recreate the file from scratch, using the updated data. – Max Lybbert Oct 16 '14 at 14:06
  • 1
    **Update**: I don't know how much you can rely on this, but in my experience using the latest HDF5 library (1.8.10), I find that the file *does* shrink after I use `H5Ldelete`. Whether this is by design or by accident, I do not know. – Owen Nov 04 '18 at 19:37
2

If you want to delete a dataset in c++ you need the following commands:

H5File m_h5File (pathAndNameToHDF5File, H5F_ACC_RDWR); //The hdf5 c++ object.
std::string channelName = "/myGroup/myDataset";
int result = H5Ldelete(m_h5File.getId(), channelName.data(), H5P_DEFAULT);

result will be a non-negative value if successful; otherwise returns a negative value. https://support.hdfgroup.org/HDF5/doc/RM/RM_H5L.html#Link-Delete

As @MaxLybbert said, the hard-disk space it is not recoverd. You must use the repack tool. However, with HDF5 v.1.10 the space can be recovered. But the user's guide is not ready yet: https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesFileSpaceMgmtDocs.html

pablo_worker
  • 1,042
  • 9
  • 26