0

Suppose I have 2 3D matrices A and B

A.shape = [ 2, 50, 60] B.shape = [ 3, 50, 60]

conceptually, I see the matrices like column vectors

A = [ a0, a1 ] where a0, a1 are matrices of shape [50,60]

[ a0
  a1 ]  

B = [ b0, b1 , b2] where b0, b1 , b2 are matrices of shape [50,60]

[ b0
  b1   
  b2 ]  

How can I create the resulting matrix of shape [ 6, 50 , 60 ] conceptually as below

A op B =

[ a0 * b0 , 
  a0 * b1 ,
  a0 * b2 ,
  a1 * b0 ,
  a1 * b1 ,
  a1 * b2 ]

where a0 * b0 is elementwise product

Is there a name for this operation? How can I efficiently compute this in numpy sequence of function or einsum for any size of A [ na, nx, ny] and B [ nb, nx, ny ] ?

palazzo train
  • 3,229
  • 1
  • 19
  • 40

2 Answers2

1

I think I found how to do it by einsum and it is much faster.

Let's generalize the input A and B by below

na = 2
nb = 3
nx = 50
ny = 60
A = np.random.random((na,nx,ny)) 
B = np.random.random((nb,nx,ny))

By Einstein notation, the operation could be written as this einsum function:

def f_einstein(A,B):
    a = A[:,np.newaxis, :,:]
    b = B[np.newaxis,:, :,:]
    einstein = np.einsum( 'ikxy,kjxy->ijxy' , a , b )
    return einstein.reshape( na*nb,nx,ny)

Thanks for @Bobby Ocean, the iterative method is :

import itertools as it
def f_iter(A,B):
    return np.stack([a*b for a,b in it.product(A,B)])

To verify, the below return True

result_eintein = f_einstein(A,B)
result_iter = f_iter(A,B)
(result_eintein==result_iter).all()

The performance of einsum vs iter can be 1.5 - 6 times faster

when na = 2, nb = 3. It's 1.5 time faster

%timeit result_eintein = f_einstein(A,B)
The slowest run took 6.33 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 34.3 µs per loop

%timeit result_iter = f_iter(A,B)
The slowest run took 11.74 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 51.3 µs per loop

when na = 200, nb = 300. It's 6 time faster

%timeit result_eintein = f_einstein(A,B)
1 loop, best of 3: 1.2 s per loop

%timeit result_iter = f_iter(A,B)
1 loop, best of 3: 7.41 s per loop

My comments on it to help people understanding:

The operation I am doing is actually something like Batch-Outer-Hadamard-Product. It is batched by the outer product of the first dimension of A and B, followed by Hadamard product which is an elementwise product of the [50,60] matrices.

Because it involves a outer product like operation, the axis of the original A and B has to be in different dimension. That's why after introduce new axis for the outer product to work:

a = A[:,np.newaxis, :,:]
b = B[np.newaxis,:, :,:]

The new shape is now [2,1,50,60] and [1,3,50,60]. Matrices a and b can do an outer product by einsum easily on the first 2 dimensions. In (partial) einstein notation it is just 'ik,kj->ij'.

For the last hadamard product, the (partial) einstein notation is simply 'xy,xy->xy'.

Combining the two, the final einstein notation is 'ikxy,kjxy->ijxy'. The only last operation needed for einsum is to reshape from [i,j] to [i*j].

Credit to Tim Rocktäschel : I read a very good page from Tim Rocktäschel https://rockt.github.io/2018/04/30/einsum . The way he colors the matrices make me very easy to understand einsum. And yes, I want to master the einsum for my deep learning projects

palazzo train
  • 3,229
  • 1
  • 19
  • 40
0

This is like a mixture of a cross product and a coordinate wise multiply. Not sure if there is any more efficient way to do it, than what you have.

import itertools as it

A = np.random.random((2,50,60)) 
B = np.random.random((3,50,60))
C = np.stack([a*b for a,b in it.product(A,B)])
print(C.shape)

You could duplicate the data, like:

a,b = list(zip(*it.product(range(len(A)),range(len(B))))) # a=(0,0,0,1,1,1) b=(0,1,2,0,1,2)
D   = A[list(a)] * B[list(b)]
print(D.shape)
print((C == D).all())
Bobby Ocean
  • 3,120
  • 1
  • 8
  • 15
  • thanks. but it is essentially a loop there. Is there a vectorization way of calculation? – palazzo train May 20 '20 at 07:15
  • I am unsure. But even if there was, the first solution does not appear to have any additional calculations performed above what you would have to do anyway. Let me put it this way. The memory for A and B alone is clearly not sufficient to house the memory for C (hence no in place operations are possible); hence you are guaranteed to have to reallocate new memory space; this leads me to think that my first answer is the correct solution and probably the fastest. – Bobby Ocean May 20 '20 at 19:24
  • do you know how to do it by einsum? I think this question can help people understand the notation. – palazzo train May 21 '20 at 03:00
  • i think i find the answer. You can see my answer here. – palazzo train May 22 '20 at 06:15