How to show all the element names in a npz file without having to load the completely?

Question

I use the following code to show the element names in a npz file. But it requires loading the file completely, which can be slow especially when the file is large. Is there a way to extract the element names without having to load the file completely?

x = numpy.load(file)
for k in x.iterkeys():
    print k

I don;t think so - the whole point of this mechanism is to efficiently load the arrays. — kabanus, Mar 11 '18 at 11:31
This could be an XY question. Why not save the names separately in another file? — kabanus, Mar 11 '18 at 11:34
@hpaulj I don't see where it is documented as a lazy loader. https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.load.html — user1424739, Mar 11 '18 at 14:13
A fuller description of the `npz` loader is in `numpy.lib.npyio.NpzFile` class docs. It's what `load` uses when given a `npz` file. ` — hpaulj, Mar 11 '18 at 17:30

Sam Macharia · Answer 1 · 2021-05-31T15:49:43.097

Without reading the entire file into memory, you can access small fragments of large files on disk by using mmap [memmap documentation]. Default is r+ (Open existing file for reading and writing).
My test code below uses the NpzFile files attribute [NpzFile documentation], and 'mnist.npz' test data [mnist.npz link], everything seems to be pretty fast in Python 3.6:

>>> import numpy as np
>>> x = np.load('mnist.npz', mmap_mode='r')
>>> for k in x.files:
...     print(k)
... 
x_test
x_train
y_train
y_test
>>>

Kindly check the linked numpy.memmap for more.

Edit: print(x.files) seems to work fine too.

Or equivalently in one line: `print(x.files)` – David Parks Sep 11 '19 at 21:32 — David Parks, Sep 11 '19 at 21:32

score 0 · Answer 2 · answered Mar 31 '22 at 21:37

An npz file is actually a zip archive as you can see from the hexdump:

$ hd data.npz 
00000000  50 4b 03 04 14 00 00 00  00 00 00 00 21 00 5f ab  |PK..........!._.|
00000010  c1 34 c8 00 00 00 c8 00  00 00 06 00 00 00 4b 31  |.4............K1|
00000020  2e 6e 70 79 93 4e 55 4d  50 59 01 00 76 00 7b 27  |.npy.NUMPY..v.{'|
00000030  64 65 73 63 72 27 3a 20  27 3c 66 38 27 2c 20 27  |descr': '<f8', '|
00000040  66 6f 72 74 72 61 6e 5f  6f 72 64 65 72 27 3a 20  |fortran_order': |
00000050  54 72 75 65 2c 20 27 73  68 61 70 65 27 3a 20 28  |True, 'shape': (|
00000060  33 2c 20 33 29 2c 20 7d  20 20 20 20 20 20 20 20  |3, 3), }        |
00000070  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |

So if you don't have python open, you can use a zip extractor or file explorer that supports showing the contents of a zip file, or even shell

unzip -l data.npz

How to show all the element names in a npz file without having to load the completely?

2 Answers2

Linked