I'm using the 'rhdf5' package for reading a large file (2GB) with about 5000 objects. I have to use this package since it appears to be the only one supporting bit64.
The problem is the following, it is very time consuming to read all the objects like this:
library(rhdf5)
library(bit64)
library(parallel)
groups = h5ls(h5_file)
obj_names = paste(groups$group[which(groups$otype == 'H5I_DATASET')], groups$name[which(groups$otype == 'H5I_DATASET')], sep='/')
# now we have 'obj_names', a list of the names contained in the file: 'h5_file'
h5read_by_name <- function(x) {
h5read(file=h5_file, name=x, bit64conversion='bit64')
}
h5data = do.call(rbind, mclapply(obj_names, h5read_by_name, mc.cores=2))
Even using multicore to speed up a little is still very long (days). If I use more cores, the stack size explodes and I'm already at the hard limit.
Any idea?