2

I have to apply some mathematical formula that I've written in python as:

    for s in range(tdim):
        sum1 = 0.0
        for i in range(dim):
            for j in range(dim):
                sum1+=0.5*np.cos(theta[s]*(i-j))*
                eig1[i]*eig1[j]+eig2[i]+eig2[j])-0.5*np.sin(theta[s]*(i-j))*eig1[j]*eig2[i]-eig1[i]*eig2[j])

        PHi2.append(sum1)

Now, this is correct, but clearly inefficient, the other way around is to do:

for i in range(dim):
            for j in range(dim):
                PHi2 = 0.5*np.cos(theta*(i-j))*(eig1[i]*eig1[j]+eig2[i]+eig2[j])-0.5*np.sin(theta*(i-j))*(eig1[j]*eig2[i]-eig1[i]*eig2[j])

However, the second example gives me the same number in all elements of PHi2, so this is faster but answer is wrong. How can you do this correctly and more efficiently?

NOTE: eig1 and eig2 are of the same dimension d, theta and PHi2 are the same dimension D, BUT d!=D.

user2820579
  • 3,261
  • 7
  • 30
  • 45

3 Answers3

5

You can use a brute force broadcasting approach, but you are creating an intermediate array of shape (D, d, d), which can get out of hand if your arrays are even moderately large. Furthermore, in using broadcasting with no refinements you are recomputing a lot of calculations from the innermost loop that you only need to do once. If you first compute the necessary parameters for all possible values of i - j and add them together, you can reuse those values on the outer loop, e.g.:

def fast_ops(eig1, eig2, theta):
    d = len(eig1)
    d_arr = np.arange(d)
    i_j = d_arr[:, None] - d_arr[None, :]
    reidx = i_j + d - 1
    mult1 = eig1[:, None] * eig1[ None, :] + eig2[:, None] + eig2[None, :]
    mult2 = eig1[None, :] * eig2[:, None] - eig1[:, None] * eig2[None, :]
    mult1_reidx = np.bincount(reidx.ravel(), weights=mult1.ravel())
    mult2_reidx = np.bincount(reidx.ravel(), weights=mult2.ravel())

    angles = theta[:, None] * np.arange(1 - d, d)

    return 0.5 * (np.einsum('ij,j->i', np.cos(angles), mult1_reidx) -
                  np.einsum('ij,j->i', np.sin(angles), mult2_reidx))

IF we rewrite M4rtini's code as a function for comparison:

def fast_ops1(eig1, eig2, theta):
    d = len(eig1)
    D = len(theta)
    s = np.array(range(D))[:, None, None]
    i = np.array(range(d))[:, None]
    j = np.array(range(d))
    ret = 0.5 * (np.cos(theta[s]*(i-j))*(eig1[i]*eig1[j]+eig2[i]+eig2[j]) -
                 np.sin(theta[s]*(i-j))*(eig1[j]*eig2[i]-eig1[i]*eig2[j]))
    return ret.sum(axis=(-1, -2))

And we make up some data:

d, D = 100, 200
eig1 = np.random.rand(d)
eig2 = np.random.rand(d)
theta = np.random.rand(D)

The speed improvement is very noticeable, 80x on top of the 115x over your original code, leading to a whooping 9000x speed-up:

In [22]: np.allclose(fast_ops1(eig1, eig2, theta), fast_ops(eig1, eig2, theta))
Out[22]: True

In [23]: %timeit fast_ops1(eig1, eig2, theta)
10 loops, best of 3: 145 ms per loop

In [24]: %timeit fast_ops(eig1, eig2, theta)
1000 loops, best of 3: 1.85 ms per loop
Jaime
  • 65,696
  • 17
  • 124
  • 159
1

This works by broadcasting.
For tdim = 200 and dim = 100.
14 seconds with original.
120 ms with the version.

s = np.array(range(tdim))[:, None, None]
i = np.array(range(dim))[:, None]
j = np.array(range(dim))
PHi2 =(0.5*np.cos(theta[s]*(i-j))*(eig1[i]*eig1[j]+eig2[i]+eig2[j])-0.5*np.sin(theta[s]*(i-j))*(eig1[j]*eig2[i]-eig1[i]*eig2[j])).sum(axis=2).sum(axis=1)
M4rtini
  • 13,186
  • 4
  • 35
  • 42
0

In the first bit of code, you have 0.5*np.cos(theta[s]*(i-j))... but in the second it's 0.5*np.cos(theta*(i-j)).... Unless you've got theta defined differently for the second bit of code, this could well be the cause of the trouble.

Edward
  • 6,964
  • 2
  • 29
  • 55
  • this is true, because in the second case I'm considering the whole list named theta being operated by the multiplication and the cosine. Since I want that each element of theta be considered while I'm running the sums over i and j. – user2820579 Jan 16 '14 at 23:53