I'm struggling with a H5 file to extract and save data as a multi column csv. as shown in the picture the structure of h5 file consisted of main groups (Genotypes, Positions, and taxa). The main group, Genotypes contains more than 1500 subgroups (genotype partial names) and each subgroup contains sub-sun groups (complete name of genotypes).There are about 1 million data sets (named calls) -each one is laid in one sub-sub group - which i need them to be written - each one - in a separate column. The problem is that when i use h5py (group.get function) i have to use the path of any calls. I extracted the all paths containing "calls" at the end of path but I cant reach all 1 million calls to get them into a csv file. could anybody help me to extracts "calls" which are 8bit integer i\as a separate columns in a csv file. By running the code in first answer I get this error:
- Traceback (most recent call last): File "path/file.py", line 32,
in
h5r.visititems(dump_calls2csv) #NOTE: function name is NOT a string! File "path/file.py", line 565, in visititems
return h5o.visit(self.id, proxy) File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File
"h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5o.pyx", line 355, in h5py.h5o.visit File "h5py\defs.pyx", line 1641, in h5py.defs.H5Ovisit_by_name File "h5py\h5o.pyx", line 302, in h5py.h5o.cb_obj_simple File "path/file.py", line 564, in proxy return func(name, self[name]) File "path/file.py", line 10, in dump_calls2csv np.savetxt(csvfname, arr, fmt='%5d', delimiter=',') File "<array_function internals>", line 6, in savetxt File "path/file.py", line 1377, in savetxt open(fname, 'wt').close() OSError: [Errno 22] Invalid argument: 'Genotypes_ArgentineFlintyComposite-C(1)-37-B-B-B2-1-B25-B2-B?-1-B:100000977_calls.csv