Given a binary file of numerical values, I can read it in using numpy.fromfile()
. This allocates a new array for the data. Say I already have an array a
and I want to read into this array. I'd had to do something like
import numpy as np
size = 1_000_000_000
size_chunk = 1_000_000
a = np.empty(size, dtype=np.double)
with open('filename', 'rb') as f:
tmp = np.fromfile(f, dtype=np.double, count=size_chunk)
a[:size_chunk] = tmp
where to make things general a
is larger than the data read into tmp
. I want to avoid the memory penalty caused by tmp
by reading directly into a
. Note that though
a[:size_chunk] = np.fromfile(f, dtype=np.double, count=size_chunk)
hides the tmp
variable, the unnecessary temporary memory is still there.
I imagine something like
np.fromfile(f, dtype=np.double, count=size_chunk, into=a[:chunk_size])
though no such into
keyword is implemented.
How can I achieve this? I'm open to using SciPy or other Python packages as well. I'll note that the H5Py package has a read_direct()
which does what I want, except my data file is a raw binary and not in HDF5 format.