6

Once you create an h5py dataset, how do you add or remove specific rows or columns from an NxM array?

My question is similar to this one, but I don't want to blindly truncate or expand the array. When removing, I need to be able to specify the exact row or column to remove.

For adding, I know I have to specify maxshape=(None, None) when creating the initial dataset, but the resize method doesn't seem to let you specify which rows or columns get truncated if you shrink the size.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Cerin
  • 60,957
  • 96
  • 316
  • 522

1 Answers1

7

h5py isn't really designed for doing this. Pandas might be a better library to use, as it's built around the concept of tables.

Having said that, here's how to do it:

In [1]: f = h5py.File('test.h5')

In [2]: arr = rand(4,4)

In [3]: dset = f.create_dataset('foo',data=arr,maxshape=(2000,2000))

In [4]: dset[:]
Out[4]:
array([[ 0.29732874,  0.59310285,  0.61116263,  0.79950116],
       [ 0.4194363 ,  0.4691813 ,  0.95648712,  0.56120731],
       [ 0.76868585,  0.07556214,  0.39854704,  0.73415885],
       [ 0.0919063 ,  0.0420656 ,  0.35082375,  0.62565894]])

In [5]: dset[1:-1,:] = dset[2:,:]

In [6]: dset.resize((3,4))

In [7]: dset[:]
Out[7]:
array([[ 0.29732874,  0.59310285,  0.61116263,  0.79950116],
       [ 0.76868585,  0.07556214,  0.39854704,  0.73415885],
       [ 0.0919063 ,  0.0420656 ,  0.35082375,  0.62565894]])

This removes column 1 from dset. It does so by assigning columns 2 and 3 to 1 and 2, respectively, before shrinking the dataset by one column. Swap the subscripts to remove row 1. You can easily write a wrapper around this if you're going to be doing it a lot.

Yossarian
  • 5,226
  • 1
  • 37
  • 59