2

Good evening,

I need some help understanding advanced broadcasting with complex numpy arrays.

I have:

array A: 50000x2000

array B: 2000x10x10

Implementation with for loop:

for k in range(50000):
    temp = A[k,:].reshape(2000,1,1)
    finalarray[k,:,:]=np.sum ( B*temp , axis=0)

I want an element-wise multiplication and summation of the axis with 2000 elements, with endproduct:

finalarray: 50000x10x10

Is it possible to avoid the for loop? Thank you!

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
torpedo
  • 283
  • 2
  • 15

2 Answers2

7

For something like this I'd use np.einsum, which makes it pretty easy to write down what you want to happen in terms of the index actions you want:

fast = np.einsum('ij,jkl->ikl', A, B)

which gives me the same result (dropping 50000->500 so the loopy one finishes quickly):

A = np.random.random((500, 2000))
B = np.random.random((2000, 10, 10))
finalarray = np.zeros((500, 10, 10))
for k in range(500):
    temp = A[k,:].reshape(2000,1,1)
    finalarray[k,:,:]=np.sum ( B*temp , axis=0)

fast = np.einsum('ij,jkl->ikl', A, B)

gives me

In [81]: (finalarray == fast).all()
Out[81]: True

and reasonable performance even in the 50000 case:

In [88]: %time fast = np.einsum('ij,jkl->ikl', A, B)
Wall time: 4.93 s

In [89]: fast.shape

Out[89]: (50000, 10, 10)

Alternatively, in this case, you could use tensordot:

faster = np.tensordot(A, B, axes=1)

which will be a few times faster (at the cost of being less general):

In [29]: A = np.random.random((50000, 2000))

In [30]: B = np.random.random((2000, 10, 10))

In [31]: %time fast = np.einsum('ij,jkl->ikl', A, B)
Wall time: 5.08 s

In [32]: %time faster = np.tensordot(A, B, axes=1)
Wall time: 504 ms

In [33]: np.allclose(fast, faster)
Out[33]: True

I had to use allclose here because the values wind up being very slightly different:

In [34]: abs(fast - faster).max()
Out[34]: 2.7853275241795927e-12
DSM
  • 342,061
  • 65
  • 592
  • 494
  • An alternative but slower than tensordot is `result = A @ np.transpose(B, (1,0,2))`. It is roughly the same speed as the einsum solution in this case. – user2653663 Sep 06 '18 at 19:32
1

This should work:

(A[:, :, None, None] * B[None, :, :]).sum(axis=1)

But it will blow up your memory for the intermediate array created by the product.

The product has shape (50000, 2000, 10, 10), thus contains 10 billion elements, which is 80 GB for 64 bit floating point values.

rocksportrocker
  • 7,251
  • 2
  • 31
  • 48