npy files size are around 5 gb and RAM is around 5gb so cannot load both numpy arrays. How to load one npy file and append its rows to other npy file without loading it
1 Answers
An npy file is a header containing the data type (metadata) and shape, followed by the data itself.
The header ends with a '\n'
(newline) character. So, open your first file in append mode, then open the second file in read mode, skip the header by readline()
, then copy chunks (using read(size)
) from the second file to the first.
There is only one thing left: to update the shape (length) field in the header. And here it gets a bit tricky, because if the two files had for example the shapes (700,)
and (400,)
, the new shape needs to be (1300,)
but you may not have space in the header for it. This depends on how many pad characters were in the original header--sometimes you will have space and sometimes you won't. If there is no space, you will need to write a new header into a new file and then copy the data from both source files. Still, this won't take much memory or time, just a bit of extra disk space.
You can see the code which reads and writes npy files here: https://github.com/numpy/numpy/blob/master/numpy/lib/format.py - there are some undocumented functions you may find useful in your quest.

- 239,568
- 38
- 324
- 436
-
Great explanation! :) Also it would be helpful if you share the code! Thanks! – charanReddy Oct 01 '17 at 08:20
-
I can't share the code because I don't have it. You'll need to write it. – John Zwinck Oct 01 '17 at 08:33