This is a performance question. I am trying to optimize the following double for loop. Here is a MWE
import numpy as np
from timeit import default_timer as tm
# L1 and L2 will range from 0 to 3 typically, sometimes up to 5
# all of the following are dummy values but match correct `type`
L1, L2, x1, x2, fac = 2, 3, 2.0, 4.5, 2.3
saved_values = np.random.uniform(high=75.0, size=[max(L1,L2) + 1, max(L1,L2) + 1])
facts = np.random.uniform(high=65.0, size=[L1 + L2 + 1])
val = 0
start = tm()
for i in range(L1+1):
sf = saved_values[L1][i] * x1 ** (L1 - i)
for j in range(L2 + 1):
m = i + j
if m % 2 == 0:
num = sf * facts[m] / (2 * fac) ** (m / 2)
val += saved_values[L2][j] * x1 ** (L1 - j) * num
end = tm()
time = end-start
print("Long way: time taken was {} and value is {}".format(time, val))
My idea for a solution is to take out the if m % 2 == 0:
statement and then calculate all i
and j
combinations i.e., a matrix
, which I should be able to vectorize, and then use something like np.where()
to add up all of the elements meeting the requirement of if m % 2 == 0:
where m= i+j
.
Even if this is not faster than the explicit for loops, it should be vectorized
because in reality I will be sending arrays to a function containing the double for loops, so being able to do that part vectorized, should get me the speed gains I am after, even if vectorizing this double for loop does not.
I am stuck spinning my wheels right now on how to broadcast, but account for the sf
factor as well as the m
factor in the inner loop.