16

I use the following code to show the element names in a npz file. But it requires loading the file completely, which can be slow especially when the file is large. Is there a way to extract the element names without having to load the file completely?

x = numpy.load(file)
for k in x.iterkeys():
    print k
user1424739
  • 11,937
  • 17
  • 63
  • 152
  • I don;t think so - the whole point of this mechanism is to efficiently load the arrays. – kabanus Mar 11 '18 at 11:31
  • This could be an XY question. Why not save the names separately in another file? – kabanus Mar 11 '18 at 11:34
  • 4
    `list(x.keys())`. Check the docs It's a lazy loader. – hpaulj Mar 11 '18 at 13:10
  • @hpaulj I don't see where it is documented as a lazy loader. https://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.load.html – user1424739 Mar 11 '18 at 14:13
  • 1
    A fuller description of the `npz` loader is in `numpy.lib.npyio.NpzFile` class docs. It's what `load` uses when given a `npz` file. ` – hpaulj Mar 11 '18 at 17:30

2 Answers2

18

Without reading the entire file into memory, you can access small fragments of large files on disk by using mmap [memmap documentation]. Default is r+ (Open existing file for reading and writing).
My test code below uses the NpzFile files attribute [NpzFile documentation], and 'mnist.npz' test data [mnist.npz link], everything seems to be pretty fast in Python 3.6:

>>> import numpy as np
>>> x = np.load('mnist.npz', mmap_mode='r')
>>> for k in x.files:
...     print(k)
... 
x_test
x_train
y_train
y_test
>>> 

Kindly check the linked numpy.memmap for more.

Edit: print(x.files) seems to work fine too.

Sam Macharia
  • 803
  • 10
  • 18
0

An npz file is actually a zip archive as you can see from the hexdump:

$ hd data.npz 
00000000  50 4b 03 04 14 00 00 00  00 00 00 00 21 00 5f ab  |PK..........!._.|
00000010  c1 34 c8 00 00 00 c8 00  00 00 06 00 00 00 4b 31  |.4............K1|
00000020  2e 6e 70 79 93 4e 55 4d  50 59 01 00 76 00 7b 27  |.npy.NUMPY..v.{'|
00000030  64 65 73 63 72 27 3a 20  27 3c 66 38 27 2c 20 27  |descr': '<f8', '|
00000040  66 6f 72 74 72 61 6e 5f  6f 72 64 65 72 27 3a 20  |fortran_order': |
00000050  54 72 75 65 2c 20 27 73  68 61 70 65 27 3a 20 28  |True, 'shape': (|
00000060  33 2c 20 33 29 2c 20 7d  20 20 20 20 20 20 20 20  |3, 3), }        |
00000070  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |

So if you don't have python open, you can use a zip extractor or file explorer that supports showing the contents of a zip file, or even shell

unzip -l data.npz
qwr
  • 9,525
  • 5
  • 58
  • 102