0

In R I would like to write a matrix of integers into an HDF5 file ".h5" as an int16 data type. To do so I am using the rhdf5 package. The documentation says that you should set one of the supported H5 data types when creating the dataset. However, even when setting up the int16 data type the result is always int32. Is it possible to store the data as int16 or uint16?

library(rhdf5)

m <- matrix(1,5,5)
outFile <- "test.h5"
h5createFile(outFile)
h5createDataset(file=outFile,"m",dims=dim(m),H5type = "H5T_NATIVE_INT16")
h5write(m,file=outFile,name="m")
H5close()
h5ls(outFile)

The result is:

enter image description here

user21816
  • 147
  • 6
  • I'm not an expert with hdf5, so forgive the naive question: what about that image says that the integer stored is `int32`? If that display were intended to distinguish between 16/32 bit ints, I'd expect `"H5T_NATIVE_INT32"` or `"H5T_NATIVE_INT16"`. – r2evans Feb 07 '23 at 14:06
  • Yes it's only when printing it on the console. Using alternate library `hdf5r` I got a similar result displaying `H5T_INTEGER` for the dataset as shown from the file. But when displaying only the dataset I got expected type `H5T_STD_I16LE` – Billy34 Feb 07 '23 at 14:15
  • @r2evans The image is not very clear as R recognizes only int32. I would expect that the otype column would be "H5T_NATIVE_INT16" as it was defined in the H5type attribute when creating the dataset. When reading the m variable from Matlab it clearly states that the variable is of type int32 and not int16. – user21816 Feb 07 '23 at 14:16
  • @Billy34 so should I use a different library? – user21816 Feb 07 '23 at 14:18
  • Before using another library try to print the dataset `m` and not the file that contains m. Look at my answer – Billy34 Feb 07 '23 at 14:19

2 Answers2

1

Using another library as I did not find rhdf5

library(hdf5r)

m <- matrix(1L,5L,5L)
outFile <- h5file("test.h5")
createDataSet(outFile, "m", m, dtype=h5types$H5T_NATIVE_INT16)

print(outFile)

print(outFile[["m"]])

h5close(outFile)

For the first print (the file)

Class: H5File
Filename: D:\Travail\Projets\SWM\swm.gps\test.h5
Access type: H5F_ACC_RDWR
Listing:
 name    obj_type dataset.dims dataset.type_class
    m H5I_DATASET        5 x 5        H5T_INTEGER

Here we see it displays H5T_INTEGER as the datatype for the dataset m

and the second (the dataset)

Class: H5D
Dataset: /m
Filename: D:\Travail\Projets\SWM\swm.gps\test.h5
Access type: H5F_ACC_RDWR
Datatype: H5T_STD_I16LE
Space: Type=Simple     Dims=5 x 5     Maxdims=Inf x Inf
Chunk: 64 x 64

We can see that it has the right datatype H5T_STD_I16LE

Billy34
  • 1,777
  • 11
  • 11
0

The code your provided works as expected, but it's a limitation of the h5ls() function in rhdf5 that it doens't report a more details data type. As @r2evans points out, it's technically true that it's an integer, you just want to know a bit more detail that that.

If we run you code and use the h5ls() tool distributed by the HDF5 group we get more information:

library(rhdf5)

m <- matrix(1,5,5)
outFile <- tempfile(fileext = ".h5")
h5createFile(outFile)
h5createDataset(file=outFile,"m", dims=dim(m),H5type = "H5T_NATIVE_INT16")
h5write(m,file=outFile, name="m")

system2("h5ls", args = list("-v", outFile))

## Opened "/tmp/RtmpFclmR3/file299e79c4c206.h5" with sec2 driver.
## m                        Dataset {5/5, 5/5}
##     Attribute: rhdf5-NA.OK {1}
##         Type:      native int
##     Location:  1:800
##     Links:     1
##     Chunks:    {5, 5} 50 bytes
##     Storage:   50 logical bytes, 14 allocated bytes, 357.14% utilization
##     Filter-0:  shuffle-2 OPT {2}
##     Filter-1:  deflate-1 OPT {6}
##     Type:      native short

Here the most important part is the final line which confirms the datatype is "native short" a.k.a native int16.

Grimbough
  • 81
  • 4