2

i have many(nearly 10000000) .npz files(for example: file1.npz,file2.npz.....file10000000.npz) in a directory. Each of the .npz files contain two variables a and b where a and b contain array data.Now i want to arrange and save all .npz files into a single big .npz file and later i want to read one by one .npz file from the big .npz file.

i tried the code below but i am not getting idea on solving the above problem.I hope some help i will get from the experts.

path=glob.glob('home/lijun/data/*.npz')
for npz files in path:
    np.vstack(npz)
  • 1
    A `npz` file is actually a `zip` archive. Each component file is a `.npy` for one array. An OS archive tool can probably nest those `npz` within another. The `savez` (and `load`) function does not have such a nesting ability. – hpaulj Oct 10 '20 at 19:40
  • `np.vstack` is a call to `concatenate` that joins arrays as rows. Given a list of arrays it returns one new array. It does not join files, `npz` or any other. – hpaulj Oct 10 '20 at 20:11
  • Unfortunately, `np.savez` doesn't allow for an append, but it's just a simple wrapper around [Python zipfile](https://docs.python.org/2/library/zipfile.html) and `zipfile` can append to zip archives. This is easy to do (and what we did in the old days, before `savez`). You can copy the model [here](https://github.com/numpy/numpy/blob/v1.19.0/numpy/lib/npyio.py#L689-L754), used for `savez`, and just take out the few lines you would need (basically replacing `'w'` with `'a'`). – tom10 Oct 10 '20 at 21:06
  • can we do it before savez –  Oct 11 '20 at 04:32
  • if so let me know...how can i update the code –  Oct 11 '20 at 04:32
  • I don't know what you mean, "can we do it before `savez`". I would suggest though, that if you want to use `zipfile` in the end, that you skip `savez` and just use zipfile. (Probably, though, you'll be able to open with `savez`.) – tom10 Oct 11 '20 at 14:02

0 Answers0