how to store metadata with numpy array compressed with blosc

Question

I am currently compressing a numpy array using blosc with the following:

view = memoryview(large_np_arr)
compressed = blosc.compress(view, typesize=8)

and decompressing like so:

decompressed = blosc.decompress(compressed_view)
decompressed_arr = np.frombuffer(decompressed, dtype=np.float64)

np.frombuffer() returns a 1-d array. Is there anyway or standard pattern for including the array's metadata (e.g. shape, type) in the compressed view?

[I know there is blosc.pack_array(), but this make a copy of the data when pickling, which I would like to avoid.]

`np.save` writes a block with that metadata in addition to the data buffer. — hpaulj, May 14 '20 at 00:35
yes, but I believe np.save() writes to a file or file-like object, which I'd like to avoid because it duplicates the array data in-memory. — morfys, May 14 '20 at 01:26
but that's the only place where both are available in one contiguous block. — hpaulj, May 14 '20 at 01:50
np.save() invokes format.write_array(fid, arr,...) where fid is any file-like object (e.g. BytesIO buffer), and it first writes the header and then the raw data. I think that may be the way to go. The format module also has read_array(). Thanks. — morfys, May 14 '20 at 03:09
Argh, but format.write_array() would still create another copy of the data :(. I rather somehow find a way to prepend the _array_header, and then use the memoryview of the array. — morfys, May 14 '20 at 03:26
I found that the package bloscpack does exactly what I need. It has the ability to compress a numpy array with its metadata without making an extra copy of the data. — morfys, May 15 '20 at 14:14

how to store metadata with numpy array compressed with blosc

0 Answers0