0

I want to combine the datasets within a single hdf5 file to form one dataset in a seperate file, but am struggling to set the dtype of the new dataset. I am getting the error AttributeError: 'Group' object has no attribute 'dtype' on the line with ds_0_dtype = h5f1[ds].dtype. the code below (based on some example code posted on stackoverflow)

 with 
 h5py.File('xxx_xxx_signals.hdf5','r') as h5f1 , \
 h5py.File('file2.h5','w') as h5f2 :

     
for i, ds in enumerate(h5f1.keys()) :
    if i == 0:
        ds_0 = ds
        ds_0_dtype = h5f1[ds].dtype
        n_rows = h5f1[ds].shape[0]
        n_cols = h5f1[ds].shape[1]
    else:
        if h5f1[ds].dtype != ds_0_dtype :
            print(f'Dset 0:{ds_0}: dtype:{ds_0_dtype}')
            print(f'Dset {i}:{ds}: dtype:{h5f1[ds].dtype}')
            sys.exit('Error: incompatible dataset dtypes')

        if h5f1[ds].shape[0] != n_rows :
            print(f'Dset 0:{ds_0}: shape[0]:{n_rows}')
            print(f'Dset {i}:{ds}: shape[0]:{h5f1[ds].shape[0]}')
            sys.exit('Error: incompatible dataset shape')

        n_cols += h5f1[ds].shape[1]
    prev_ds = ds    


h5f2.create_dataset('ds_xxxx', dtype=ds_0_dtype, shape=(n_rows,n_cols), maxshape=(n_rows,None))
       
first = 0
for ds in h5f1.keys() :
    xfer_arr = h5f1[ds][:]
    last = first + xfer_arr.shape[1]
    h5f2['ds_xxxx'][:, first:last] = xfer_arr[:]
    first = last
New Dev
  • 48,427
  • 12
  • 87
  • 129
coder411
  • 1
  • 1

1 Answers1

0

Likely you have 1 or more Groups in addition to Datasets at the Root level. h5f1.keys() accesses all Nodes -- which can be Datasets or Groups. You need to add a test to skip over Groups. You do this with an isinstance() logic test. Something like this:

else: 
    if not isinstance(h5f1[ds], h5py.Dataset) : 
        print(f'Node 0:{ds_0}: is not a dataset')
        sys.exit('Error: unexpected Group; only Datasets expected')
    if h5f1[ds].dtype != ds_0_dtype :

Once you know how to identify groups, you can also modify code to avoid copying them to the second file. However, that may not be your desired result. I have an extended SO post on using isinstance(). See this link: Is there a way to get datasets in all groups at once in h5py?

kcw78
  • 7,131
  • 3
  • 12
  • 44
  • Thank you for the response. I managed to extract the data and put it together using np.concatenate which has been much easier for me. I am trying to pre-process some raw accelerometer data for a Human Activity Recognition application so this is very difficult for me at the start. – coder411 Mar 09 '21 at 22:39