0

I have a HDF5 file dataset which contains different data types(int and float).

While reading it in numpy array, it detects it as array of type np.void.

import numpy as np
import h5py

f = h5py.File('Sample.h5', 'r')
array = np.array(f['/Group1/Dataset'])
print(array.dtype)

Image of the data types {print(array.dtype)}

How can I read this dataset into arrays with each column as the same data type as that of input? Thanks in advance for the reply

  • As your image shows, the data in the array is of the same type as the HDF5 dataset. In this case, ID and DOMAIN_ID are integers (i8) and the others XR, YR, ZR, etc, are reals/floats (f8). A structured array or record array is used when there are mixed data types. (These arrays are similar but not the same.) Now, if you want an ndarray where all data types are the same, you will have to slice a subset of the data from the HDF5 dataset or from the extracted array above (using appropriate indices) I will try to create an example (hard to do without the HDF5 file). – kcw78 Mar 11 '19 at 19:03
  • Just access each `field` (not column) of the array by name. `array['ID']`, `array['RXI']`. – hpaulj Mar 11 '19 at 21:32
  • I would load the dataset with, `arr = f['/Group1/Dataset'][:]` syntax. According to the `h5py` docs that's the preferred way. Either way you get a structured array matching the `dataset` in dtype and layout. – hpaulj Mar 11 '19 at 21:35
  • Does this answer your question? [python h5py: can I store a dataset which different columns have different types?](https://stackoverflow.com/questions/51729840/python-h5py-can-i-store-a-dataset-which-different-columns-have-different-types) – Kermit Feb 16 '21 at 00:11

1 Answers1

0

Here are 2 simple examples showing both ways to slice a subset of the dataset using the HDF5 Field/Column names.

The first method extracts a subset of the data to a record array by slicing when accessing the dataset. The second method follows your current method. It extracts the entire dataset to a record array, then slices a new view to access a subset of the data.

Print statements are used liberally so you can see what's going on.

Method 1

real_array= np.array(f['/Group1/Dataset'][:,'XR','YR','ZR'])
print(real_array.dtype)
print(real_array.shape)

Method 2

cmplx_array = np.array(f['/Group1/Dataset'])
print(cmplx_array.dtype)
print(cmplx_array.shape)

disp_real = cmplx_array[['XR','YR','ZR']]
print(disp_real.dtype)
print(disp_real.shape)

Review this SO topic for additional insights into copying values from a recarray to a ndarray, and back.

copy-numpy-recarray-to-ndarray

kcw78
  • 7,131
  • 3
  • 12
  • 44