I need to share a large dataset from an HDF5 file between multiple processes and, for a set of reasons, mmap is not an option.
So I read it into a numpy array and then copy this array into shared memory, like this:
import h5py
from multiprocessing import shared_memory
dataset = h5py.File(args.input)['data']
shm = shared_memory.SharedMemory(
name=memory_label,
create=True,
size=dataset.nbytes
)
shared_tracemap = np.ndarray(dataset.shape, buffer=shm.buf)
shared_tracemap[:] = dataset[:]
But this approach doubles the amount of required memory, because I need to use a temporary variable. Is there a way to read the dataset directly into SharedMemory?