-1

I've a hdf5 file in the below format. {...} represent groups and some have subgroups. download the file with the below link

https://drive.google.com/file/d/1f6a0XEPGE4aSEKODVbJ1Q9AUw24Bt9_2/view?usp=sharing

{'A': np.array(...), 
 'B':np.array(...), 
 'C':{
      'A': np.array(...),
      'B': {...},
      'C': np.array(...),
      'D': np.array(...)},
 'D':{
      'A': {...},
      'B': {...},
      'C': {...},
      'D': {...}}
}

I'm trying with the below code to create a dictionary but it is not in the correct format. can someone help with this ?

import h5py
import numpy as np
driv
def read_hdf5_file(file):

    for key,val in file.items():
        if type(val) == h5py._hl.dataset.Dataset:
            d[key] = np.array(val)
#             print(key,np.array(val))

        else:
            d[key] = read_hdf5_file(val)
    return d

if __name__=='__main__':
    d = dict()
    file = h5py.File("data.hdf5")
    read_hdf5_file(file)
  • 1
    What do you mean by "not in the correct format"? What format is it in, and what should it be instead? – mkrieger1 Nov 21 '21 at 14:22
  • `if type(val) == h5py._hl.dataset.Dataset`: generally, prefer using `isinstance`: `if isinstance(val, h5py._hl.dataset.Dataset):`. In case `val` is a subclass of Dataset, this will also evaluate to true, while not the first variant with `type`. (Unless, of course, you don't want subclasses.) – 9769953 Nov 21 '21 at 14:23
  • @mkrieger1 the output dictionary is not in the correct format as that of the file. – Arunakiri Venkatachalam Nov 21 '21 at 14:32
  • I suspect I need to use the recursion in a different way – Arunakiri Venkatachalam Nov 21 '21 at 14:35
  • DO NOT post images of code, data, error messages, etc. - copy or type the text into the question. Please reserve the use of images for diagrams or demonstrating rendering bugs, things that are impossible to describe accurately via text. For more information please see the Meta FAQ entry [Why not upload images of code/errors when asking a question?](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question/285557#285557) – itprorh66 Nov 21 '21 at 15:32
  • To recursively descend the object tree you really need to use h5py `.visit()` or `.visititems()` methods. Also, you need to be careful using dataset names as your dictionary keys. You have multiple datasets named 'A', 'B', etc. The `.items()` iterator returns the local name, **_not_** the full path to the dataset (which is what you will need). – kcw78 Nov 21 '21 at 16:54

1 Answers1

1

Here is a very simple example showing how to use .visititems() to recursively iterate all objects (datasets and groups) in the object tree and return a dictionary of dataset names and h5py objects (where names are the full path). Unfortunately .visititems() does not behave like a generator; it will exit if there is a return or yield. To do this, you have to "wrap" .visititems() in a function that is your generator. It's not hard, just more involved. (Or use PyTables...it has a great set of "walk" methods to do this.)

Code below (warning: it is verbose to show what it is doing.)

def get_ds_dictionaries(name, node):
  
    fullname = node.name
    if isinstance(node, h5py.Dataset):
    # node is a dataset
        print(f'Dataset: {fullname}; adding to dictionary')
        ds_dict[fullname] = node
        print('ds_dict size', len(ds_dict)) 
    else:
     # node is a group
        print(f'Group: {fullname}; skipping')  
    
with h5py.File('data.hdf5','r') as h5f:
        
    ds_dict = {}  
    print ('**Walking Datasets to get dictionaries**\n')
    h5f.visititems(get_ds_dictionaries)
    print('\nDONE')
    print('ds_dict size', len(ds_dict))
kcw78
  • 7,131
  • 3
  • 12
  • 44