1

I am currently compressing a numpy array using blosc with the following:

view = memoryview(large_np_arr)
compressed = blosc.compress(view, typesize=8)

and decompressing like so:

decompressed = blosc.decompress(compressed_view)
decompressed_arr = np.frombuffer(decompressed, dtype=np.float64)

np.frombuffer() returns a 1-d array. Is there anyway or standard pattern for including the array's metadata (e.g. shape, type) in the compressed view?

[I know there is blosc.pack_array(), but this make a copy of the data when pickling, which I would like to avoid.]

morfys
  • 2,195
  • 3
  • 28
  • 35
  • `np.save` writes a block with that metadata in addition to the data buffer. – hpaulj May 14 '20 at 00:35
  • yes, but I believe np.save() writes to a file or file-like object, which I'd like to avoid because it duplicates the array data in-memory. – morfys May 14 '20 at 01:26
  • but that's the only place where both are available in one contiguous block. – hpaulj May 14 '20 at 01:50
  • np.save() invokes format.write_array(fid, arr,...) where fid is any file-like object (e.g. BytesIO buffer), and it first writes the header and then the raw data. I think that may be the way to go. The format module also has read_array(). Thanks. – morfys May 14 '20 at 03:09
  • Argh, but format.write_array() would still create another copy of the data :(. I rather somehow find a way to prepend the _array_header, and then use the memoryview of the array. – morfys May 14 '20 at 03:26
  • I found that the package bloscpack does exactly what I need. It has the ability to compress a numpy array with its metadata without making an extra copy of the data. – morfys May 15 '20 at 14:14

0 Answers0