0

I'm having trouble writing strings to groups in hdf5 files using rhdf5 using the low-level API, and specifically the functions

  • H5Fcreate
  • H5Gcreate
  • H5Screate
  • H5Dcreate
  • H5Dwrite

Here's my data:

# strings
v <- c("val1", "val2", "cat", "dog")

I'd like this vector v to exist within an hdf5 group called "metadata." Here, I try to write this 1d char array to file:

filename <- '/tmp/test.hdf5'

if(file.exists(filename)) {
  file.remove(filename)
}

h5createFile(filename)
fid <- H5Fcreate(filename)    
g2 <- H5Gcreate(fid, "/metadata") 
dtype <- "H5T_C_S1"
sid <- H5Screate_simple(NROW(v))
g <- H5Dcreate(g2, "v", dtype, sid) 
H5Dwrite(g, v, h5spaceMem = sid, h5spaceFile = sid)
H5Dclose(g)
H5Sclose(sid)
h5closeAll()

But when I read it:

> h5read(filename,"/metadata/", bit64conversion="bit64")
$v
[1] "" "" "" ""

It's totally blank. The dimension and type are both right, but there are no contents.

You can see it's /there/ but I cannot extract the data:

> h5ls(filename, all=TRUE)
      group     name         ltype corder_valid corder cset       otype
0         / metadata H5L_TYPE_HARD        FALSE      0    0   H5I_GROUP
1 /metadata        v H5L_TYPE_HARD        FALSE      0    0 H5I_DATASET
  num_attrs dclass      dtype  stype rank dim maxdim
0         0                             0           
1         0 STRING H5T_STRING SIMPLE    1   4      4

I can /read/ hdf5 files with rhdf5, and I can write to/from using h5py in python, so I know the machine is setup with the right binaries for hdf5 access. But what am I doing wrong that I cannot write hdf5 character vectors in R?

Mittenchops
  • 18,633
  • 33
  • 128
  • 246

2 Answers2

0

Why not just do something like:

if(file.exists(filename)) {
  file.remove(filename)
}
h5createFile(filename)
h5createGroup(file = filename, group = "/metadata")
h5write(file = filename, obj = v, name = "/metadata/v")
h5closeAll()
h5ls(filename)
h5read(file = filename, name = "/metadata/v")
biomiha
  • 1,358
  • 2
  • 12
  • 25
0

The issue here is that for the "H5T_C_S1" datatype you have to set the length of the strings it's going to hold. You can do that via:

tid <- H5Tcopy("H5T_C_S1")
H5Tset_size(tid, size = 4)

An alternative is to use size = NULL which will allow variable length strings. The difference between the two options is discussed in a bit more detail in the manual page for the h5createDataset() function.

For completeness, here's writing and then reading the dataset using {rhdf5}.

library(rhdf5)

v <- c("val1", "val2", "cat", "dog")

filename <- tempfile(fileext = ".h5")
fid <- H5Fcreate(filename)    
g2 <- H5Gcreate(fid, "/metadata") 

## create a string datatype
tid <- H5Tcopy("H5T_C_S1")
## set a fixed size that accommodates all the strings to be written
H5Tset_size(tid, size = 4)

sid <- H5Screate_simple(NROW(v))
g <- H5Dcreate(g2, "v", dtype_id = tid, sid) 
H5Dwrite(g, v, h5spaceMem = sid, h5spaceFile = sid)

h5closeAll()


h5read(filename, name = "/metadata/")
#> $v
#> [1] "val1" "val2" "cat"  "dog"
Grimbough
  • 81
  • 4