I am trying to build a library which reads complex HDF5 data files in python.
I am running into a problem where, an HDF5 Dataset
somehow implements the default array protocol (sometimes), such that when a numpy array is created from it, it casts to the particular array type.
In [8]: ds
Out[8]: <HDF5 dataset "two_by_zero_empty_matrix": shape (2,), type "<u8">
In [9]: ds.value
Out[9]: array([2, 0], dtype=uint64)
This Dataset
object, implements the numpy array protocol, and when the dataset consists of numbers, it supplies a default array type.
In [10]: np.array(ds)
Out[10]: array([2, 0], dtype=uint64)
However, if the dataset doesn't consist of numbers, but some other objects, as you would expect, it just uses a numpy array of type np.object
:
In [43]: ds2
Out[43]: <HDF5 dataset "somecells": shape (2, 3), type "|O8">
In [44]: np.array(ds2)
Out[44]:
array([[<HDF5 object reference>, <HDF5 object reference>,
<HDF5 object reference>],
[<HDF5 object reference>, <HDF5 object reference>,
<HDF5 object reference>]], dtype=object)
This behavior might seem convenient but in my case it's actually inconvenient since it interferes with my recursive traversal of the data file. Working around it really turns out to be difficult since there a lot of different possible data types which have to be special-cased a little differently depending on whether they are children of objects or arrays of numbers.
My question is this: is there a way to suppress the default array creation protocol, such that I could create an object array out of dataset objects that want to cast to their natural duck types?
That is, I want something like: np.array(ds, dtype=object)
, which will produce an array of [<Dataset object of type int>, dtype=object]
and not [3 4 5, dtype=int]
.
But np.array(ds, dtype=np.object)
throws IOError: Can't read data (No appropriate function for conversion path)
I tried in earnest to google some documentation about the numpy array protocol works, and found a lot, but it doesn't really appear to me that anyone considered the possibility that someone might want this behavior.