3

I am trying to compare the contents of two binary files. I use python 3.6 filecomp comparing same name files inside two directories.

results_dummy=filecmp.cmpfiles(dir1, dir2, common, shallow=True)

The above line works for *.bin file I have in both directories, but it does not work with h5 files.

When comparing two hdf5 files that contain exactly the same groups/datasets and numerical data, filecmp.cmpfiles finds them as mismatch.

Is there anyway to compare the contents of two hdf5 files from within Python script and without using h5diff?

Thanks in Advance,

Heli
  • 159
  • 1
  • 3
  • 10
  • 1
    Are your HDF5 files binary identical (byte for byte)? All `filecmp` can do is compare raw file contents - without data interpretation.. – randomir Nov 16 '17 at 16:35
  • @randomir cmp -b file1.h5 file2.h5 says the files are NOT binary equal. The two files have the same contents, so the difference should be related to the internal format of hdf5 I guess. Anyway checking to see if the two files contain the same content is the only thing I care to check. Is there anyway to check this from python? – Heli Nov 16 '17 at 18:12
  • Quick google search lead me to [this project on GitHub](https://github.com/NeurodataWithoutBorders/diff). It uses `h5py` to load both files and compare the contents. The answer to a [similar question on SO](https://stackoverflow.com/questions/41850082/comparing-h5-files) proposes a tool `hdiff`, but the links are dead. Also, if you are aware of the `h5diff`, why not use it (from Python)? – randomir Nov 16 '17 at 18:39
  • @randomir I did not want to use hdiff because I was trying to avoid the user to have to install hdf5/Tools API in order to run the script. – Heli Nov 17 '17 at 10:06

1 Answers1

1

I finally set with using h5diff. The user of the script would need to install hdf5/Tools to run the script though,

Thanks for your answers,

Heli
  • 159
  • 1
  • 3
  • 10