0

I use numpy.memmap to load only the parts of arrays into memory that I need, instead of loading an entire huge array. I would like to do the same with bool arrays.

Unfortunately, bool memmap arrays aren't stored economically: according to ls, a bool memmap file requires as much space as a uint8 memmap file of the same array shape.

So I use numpy.unpackbits to save space. Unfortunately, it seems not lazy: It's slow and can cause a MemoryError, so apparently it loads the array from disk into memory instead of providing a "bool view" on the uint8 array.

So if I want to load only certain entries of the bool array from file, I first have to compute which uint8 entries they are part of, then apply numpy.unpackbits to that, and then again index into that.

Isn't there a lazy way to get a "bool view" on the bit-packed memmap file?

root
  • 1,812
  • 1
  • 12
  • 26

1 Answers1

2

Not possible. The memory layout of a bit-packed array is incompatible with what you're looking for. The NumPy shape-and-strides model of array layout does not have sub-byte resolution. Even if you were to create a class that emulated the view you want, trying to use it with normal NumPy operations would require materializing a representation NumPy can work with, at which point you'd have to spend the memory you don't want to spend.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • Thanks! Please add a source for this info, if possible. Could rewriting NumPy help, or are there fundamental hardware limitations? (e.g. some very low-level component not allowing to "collect" nonconsecutive bits from different places into an array) – root Jun 14 '18 at 21:53
  • 1
    @root: Modern computer architectures don't have bit-addressable memory, so trying to support this would be very expensive. As for sources, [here's one](https://docs.scipy.org/doc/numpy/reference/internals.html#internal-organization-of-numpy-arrays) for NumPy array representation. – user2357112 Jun 15 '18 at 20:53