0

npy files size are around 5 gb and RAM is around 5gb so cannot load both numpy arrays. How to load one npy file and append its rows to other npy file without loading it

charanReddy
  • 137
  • 1
  • 11

1 Answers1

0

An npy file is a header containing the data type (metadata) and shape, followed by the data itself.

The header ends with a '\n' (newline) character. So, open your first file in append mode, then open the second file in read mode, skip the header by readline(), then copy chunks (using read(size)) from the second file to the first.

There is only one thing left: to update the shape (length) field in the header. And here it gets a bit tricky, because if the two files had for example the shapes (700,) and (400,), the new shape needs to be (1300,) but you may not have space in the header for it. This depends on how many pad characters were in the original header--sometimes you will have space and sometimes you won't. If there is no space, you will need to write a new header into a new file and then copy the data from both source files. Still, this won't take much memory or time, just a bit of extra disk space.

You can see the code which reads and writes npy files here: https://github.com/numpy/numpy/blob/master/numpy/lib/format.py - there are some undocumented functions you may find useful in your quest.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436