I'm working with some very large arrays. An issue that I'm dealing with of course is running out of RAM to work with, but even before that my code is running slowly so that, even if I had infinite RAM, it would still take way too long. I'll give a bit of my code to show what I'm trying to do:
#samplez is a 3 million element 1-D array
#zfit is a 10,000 x 500 2-D array
b = np.arange((len(zfit))
for x in samplez:
a = x-zfit
mask = np.ma.masked_array(a)
mask[a <= 0] = np.ma.masked
index = mask.argmin(axis=1)
# These past 4 lines give me an index array of the smallest positive number
# in x - zift
d = zfit[b,index]
e = zfit[b,index+1]
f = (x-d)/(e-d)
# f is the calculation I am after
if x == samplez[0]:
g = f
index_stack = index
else:
g = np.vstack((g,f))
index_stack = np.vstack((index_stack,index))
I need to use g and index_stack, each of which are 3million x 10,000 2-D arrays, in a further calculation. Each iteration of this loop takes almost 1 second, so 3 million seconds total, which is way too long.
Is there anything I can do so that this calculation will run much faster? I've tried to think how I can do without this for loop, but the only way I can imagine is making 3 million copies of zfit, which is unfeasible.
And is there someway I can work with these arrays by not keeping everything in RAM? I'm a beginner and everything I've searched about this is either irrelevant or something I can't understand. Thanks in advance.