Iterate over an HDF file with groups and subgroups with python itertools

Question

I would like to parse an HDF file that has the following format

HDFFile/
    Group1/Subgroup1/DataNDArray
          ...
          /SubgroupN/DataNDArray
    ...
    GroupM/Subgroup1/DataNDArray
          ...
          /SubgroupN/DataNDArray

I am trying to use itertools.product but I get stuck on what to use for the second iterator MWE:

from itertools import *
import h5py

hfilename = 'data.hdf'
with h5py.File(hfilename, 'r') as hfile:
    for group, subgroup, dim in product(hfile.itervalues(), ????, range(10));
        parse(group, subgroup, dim)

Obviously my problem is that the second iterator would depend on the extracted value of the first iterator, which can't be available in the same one liner.

I know that I can do it with for loops or with the following example:

with h5py.File(hfilename, 'r') as hfile:
    for group in hfile.itervalues():
        for subgroup, dim in product(group.itervalues(), range(10)):
            parse(group, subgroup, dim)

but I was wondering if there is a way to do it in one itertools run.

score 0 · Answer 1 · answered Dec 21 '15 at 16:47

Does the second iterator depend on the extracted value of the first iterator? From your example it seems like there are N subgroups in every group.

A solution with list comprehensions and () generators (instead of product) would look like:

M = 3
N = 2

a = ['Group' + str(m) for m in range(1, M + 1)]
b = ['Subgroup' + str(n) for n in range(1, N + 1)]
c = ('{}/{}/DataNDArray'.format(ai, bi) for ai in a for bi in b)

for key in c:
    print(key)

and print:

Group1/Subgroup1/DataNDArray
Group1/Subgroup2/DataNDArray
Group2/Subgroup1/DataNDArray
Group2/Subgroup2/DataNDArray
Group3/Subgroup1/DataNDArray
Group3/Subgroup2/DataNDArray

which should be what you want.

Iterate over an HDF file with groups and subgroups with python itertools

1 Answers1