Adding or removing specific rows or columns in an h5py dataset

Question

Once you create an h5py dataset, how do you add or remove specific rows or columns from an NxM array?

My question is similar to this one, but I don't want to blindly truncate or expand the array. When removing, I need to be able to specify the exact row or column to remove.

For adding, I know I have to specify maxshape=(None, None) when creating the initial dataset, but the resize method doesn't seem to let you specify which rows or columns get truncated if you shrink the size.

score 7 · Answer 1 · answered Apr 30 '14 at 12:51

h5py isn't really designed for doing this. Pandas might be a better library to use, as it's built around the concept of tables.

Having said that, here's how to do it:

In [1]: f = h5py.File('test.h5')

In [2]: arr = rand(4,4)

In [3]: dset = f.create_dataset('foo',data=arr,maxshape=(2000,2000))

In [4]: dset[:]
Out[4]:
array([[ 0.29732874,  0.59310285,  0.61116263,  0.79950116],
       [ 0.4194363 ,  0.4691813 ,  0.95648712,  0.56120731],
       [ 0.76868585,  0.07556214,  0.39854704,  0.73415885],
       [ 0.0919063 ,  0.0420656 ,  0.35082375,  0.62565894]])

In [5]: dset[1:-1,:] = dset[2:,:]

In [6]: dset.resize((3,4))

In [7]: dset[:]
Out[7]:
array([[ 0.29732874,  0.59310285,  0.61116263,  0.79950116],
       [ 0.76868585,  0.07556214,  0.39854704,  0.73415885],
       [ 0.0919063 ,  0.0420656 ,  0.35082375,  0.62565894]])

This removes column 1 from dset. It does so by assigning columns 2 and 3 to 1 and 2, respectively, before shrinking the dataset by one column. Swap the subscripts to remove row 1. You can easily write a wrapper around this if you're going to be doing it a lot.

Adding or removing specific rows or columns in an h5py dataset

1 Answers1