0

I am comparing two different hdf5 files to make sure that they match. I want to create a list with all of the datasets in the group in the hdf5 file so that I can have a loop run through all of the datasets, instead of entering them manually. I cant seem to find away to do this. Currently I am getting the data set by using this code:

tdata21 = ft['/PACKET_0/0xeda9_data_0004']

The names of the sets are located in the "PACKET_0" group. Once I arrange all of the datasets, I compare the data in the datasets in this loop:

for i in range(len(data1)):
   print "%d\t%g\t%g" % (i, data1[i],tdata1[i])
   if(data1[i]!=tdata1[i]):
     x="data file: data1 \nline:"+ str(i) + "\norgianl data:"  + str(data1[i]) + "\nrecieved data:" + str(tdata1[i]) + "\n\n"
     correct.append(x)

If there is an smartier way to compare hdf5 files I would like to see it as will, but mainly I am just looking for a way to get the names of all of the datasets in the group into a list. Thank you

  • I know that a similar question exists in this post, but I do not really understand it, so if it would work for my case, could someone explain how to use it. [link](http://stackoverflow.com/questions/35953404/listing-datasets-in-a-group-in-hdf5?rq=1) – Nikita Belooussov Jan 06 '17 at 00:36
  • 1
    Are you using h5py? Add that tag. numpy as well. – hpaulj Jan 06 '17 at 07:15
  • 1
    http://docs.h5py.org/en/latest/high/group.html#dict-interface-and-links - on accessing elements of a group as though it were a dictionary, including the used of `keys()`, `items()` etc. – hpaulj Jan 06 '17 at 07:38

1 Answers1

2

To get the datasets or groups that exist in an HDF5 group or file, just call list() on that group or file. Using your example, you'd have

datasets = list(ft['/PACKET_0'])

You can also just iterate over them directly, by doing:

for name, data in ft['/PACKET_0'].items():
    # do stuff for each dataset

If you want to compare two datasets for equality (i.e., they have the same data), the easiest way would be to do this:

(dataset1.value == dataset2.value).all()

This returns NumPy arrays from each dataset, compares those arrays element-by-element, and returns True if they match everywhere and False otherwise.

You can combine these two concepts to compare every dataset in two different files.

bnaecker
  • 6,152
  • 1
  • 20
  • 33