19

I have quite big dataset. All information stored in the hdf5 format file. I found h5py library for python. All works properly except of the

[<HDF5 object reference>]

I have no idea how to convert it in something more readable. Can I do it at all ? Because documentation in this question slightly hard for me. Maybe there are some others solutions with different languages not only Python. I appreciate every help I will get.

In the ideal it should be link to the file.

It's the part of my code:

import numpy as np
import h5py 
import time

f = h5py.File('myfile1.mat','r') 
#print f.keys()
test = f['db/path']
st = test[3]
print(  st )

st output is [<HDF5 object reference>]

test output is <HDF5 dataset "path": shape (73583, 1), type "|O8">

And I expect instead [<HDF5 object reference>] something like that one: /home/directory/file1.jpg. If it is possible of course.

Dmytro Chasovskyi
  • 3,209
  • 4
  • 40
  • 82
  • 1
    My question isn't about the format only, but about data representation that more important. Maybe I didn't say it correctly in my post, but unfortunately these answers not for my question in real. – Dmytro Chasovskyi Feb 16 '15 at 14:03

3 Answers3

37

My friend answered my question and I understood how it was easy. But I spent more than 4 hours solving my small problem. The solution is:

import numpy as np
import h5py 
import time

f = h5py.File('myfile1.mat','r') 
test = f['db/path']
st = test[0][0]
obj = f[st]
str1 = ''.join(chr(i) for i in obj[:])
print( str1 )

I'm sorry if don't specified my problem accurately. But this the solution I tried to find.

Dmytro Chasovskyi
  • 3,209
  • 4
  • 40
  • 82
  • 2
    Can you explain, what does it mean? – Dims Feb 17 '16 at 21:02
  • 1
    @Dims If I understand correctly, the trouble we're running into is that we have a ``, in other words, a reference, not the object itself. The "object" itself is our string. (This is what `st` is in the code in the answer). Therefore, since this reference is a referring to the object on the file that we read (`f`), so then we do `f[st]`, which returns our actual object (`obj`). Then to convert this HDF5 object into a string, we have to iterate over it, take each integer `i`, convert it to a character (by doing `chr(i)`) and join it together to get our string – RyanQuey Jul 06 '20 at 10:46
  • This question and answer are similar: https://stackoverflow.com/a/12048685/6952495 – RyanQuey Jul 06 '20 at 11:05
  • 1
    @RyanQuey The questions are siblings, true but not the same (aka duplicates). – Dmytro Chasovskyi Jul 07 '20 at 07:42
  • 1
    @DmytroChasovskyi definitely, I'd agree. Wasn't trying to say they were duplicate, just wanted to tag them as similar for those who were trying to solve something that the other question addressed – RyanQuey Jul 07 '20 at 10:22
3

You can define your own __str__() or __repr__() method for this class, or create a simple wrapper which formats a string with the information you want to see. Based on quick browsing of the documentation, you could do something like

from h5py import File

class MyHDF5File (File):
    def __repr__ (self):
        return '<HDF5File({0})>'.format(self.filename)
tripleee
  • 175,061
  • 34
  • 275
  • 318
3

Solution

Derive a class from HDF5 and overwrite __repr__ method.

Explanation

When you print an object the interpreter give to you call the function __repr__ on that object wich by default returns the class name and the memory location of the instance.

class Person: 
    def __init__(self, name):
        self.name = name

p = Person("Jhon Doe")
print(p)

>>> <__main__.Person object at 0x00000000022CE940>

In your case, you have a list with just one instance of HDF5 object. The equivalent would be:

print([p])
>>> [<__main__.Person object at 0x000000000236E940>]

Now, you can change how objects are printed by overwirting the __repr__ function of such class.

Note: You could overwrite __str__ as well, see Difference between str and repr in Python for more detail.

class MyReadablePerson(Person):
    def __init__(self, name):
        super(MyReadablePerson, self).__init__(name)
    def __repr__(self):
        return "A person whose name is: {0}".format(self.name)

p1 = MyReadablePerson("Jhon Doe")
print(p1)

>>> A person whos name is: Jhon Doe
Community
  • 1
  • 1
Raydel Miranda
  • 13,825
  • 3
  • 38
  • 60