Issue:
I am trying to repack a subset of rows and fields from a large numpy structured array.
When I use a slice, I am able to use repack_fields
, but when I use a range
I am not. Calling
range
before repack_fields
appears to be allocating all of the memory needed by the original array.
Example:
Below is an example where I limit the available memory in order to introduce the error I am observing my use case.
import numpy as np
from numpy.lib.recfunctions import repack_fields
import resource
resource.setrlimit(resource.RLIMIT_AS, (int(1.5e9), int(1.5e9)))
H = np.zeros(100, dtype=[('f1', int), ('f2', int), ('large', float, 1000000)])
print('Using slicing: ')
repack_fields(H[['f1', 'f2']][0:50])
print('Using range: ')
repack_fields(H[['f1', 'f2']][range(0, 50)])
produces the output:
Using slicing:
Using range:
Traceback (most recent call last):
File "test.py", line 12, in <module>
repack_fields(H[['f1', 'f2']][range(0, 50)])
MemoryError: Unable to allocate 381. MiB for an array with shape (50,) and data type {'names':['f1','f2'], 'formats':['<i8','<i8'], 'offsets':[0,8], 'itemsize':8000016}
Questions:
Why is the behavior of
range(0, 50)
different than0:50
? (A list also doesn't work.) I know in the above example, one could repack the fields first, and then reference the rows. (That is,repack_fields(H[['f1', 'f2']])[range(0, 50)]
works.) But I don't want to have to know whether it is better to get rows first or fields first.What is the correct way to take a subset of rows/fields from a large numpy structured array? (even when the rows are not consecutive)?