Why does NumPy matrix multiply broadcast work in one direction but not in the transposed direction?

Question

Consider the following matrix product between two arrays:

import numpy as np
A = np.random.rand(2,10,10)                                             
B = np.random.rand(2,2)                                                 
C = A.T @ B

...goes fine. I think of the above as a 1-by-2 times 2-by-2 vector-matrix product broadcast over the 10-by-10 2nd and 3rd dimensions of A. Inspection of the result C confirms this intuition; np.allclose(C[i,j], A.T[i,j] @ B) for all i, j.

Now mathematically, I should be able to compute C.T as well as: B.T @ A, but:

B.T @ A                                                                
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-32-ffdbb14ca160> in <module>
----> 1 B.T @ A

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 10 is different from 2)

So broadcast-wise, a 10-by-10-by-2 tensor and a 2-by-2 matrix are compatible with respect to matrix product, but a 2-by-2 matrix and 2-by-10-by-10 tensor are not?

Bonus info: I want to be able to compute the "quadratic product" A.T @ B @ A and it really annoys me to have to write for-loops to manually "broadcast" over one of the dimensions. It feels like it should be possible to do this more elegantly. I am pretty experienced with Python and NumPy, but I rarely go beyond two-dimensional arrays.

What am I missing here? Is there something about the way transpose operates on tensors in NumPy that I do not understand?

The catcg phrase - last of A pairs with the 2nd to the last of B. The first dim of a 3d is a 'batch' dimension. — hpaulj, Oct 01 '19 at 20:33
I think I found a hint here: https://stackoverflow.com/a/22973830/865169. It seems `np.einsum` can do the job: `C_transpose = np.einsum('ij,jkl', B.T, A)` I have just still not figured out to configure `np.einsum` to compute `A.T @ B @ A` which is my ultimate goal. — Thomas Arildsen, Oct 01 '19 at 21:11
`einsum` can have more than two terms: `np.einsum('i...,ij,j...',A,B,A)` note the use of Ellipsis for dimensions you want aligned but not summed over. — Paul Panzer, Oct 01 '19 at 22:13

hpaulj · Answer 1 · 2019-10-01T22:21:36.007

In [194]: A = np.random.rand(2,10,10)                                           
     ...:    
     ...: B = np.random.rand(2,2)                                               
In [196]: A.T.shape                                                             
Out[196]: (10, 10, 2)

In [197]: C = A.T @ B                                                           
In [198]: C.shape                                                               
Out[198]: (10, 10, 2)

The einsum equivalent is:

In [199]: np.allclose(np.einsum('ijk,kl->ijl',A.T,B),C)                         
Out[199]: True

or incorporating the transpose into the indexing:

In [200]: np.allclose(np.einsum('kji,kl->ijl',A,B),C)                           
Out[200]: True

Note that k is the summed dimension. j and l are other dot dimensions. i is a kind of 'batch' dimension.

Or as you explain np.einsum('k,kl->l', A.T[i,j], B)

To get C.T, the einsum result indices should be lji, or lk,jki->lji:

In [201]: np.allclose(np.einsum('lk,jki->lji', B.T, A.transpose(1,0,2)), C.T)      
Out[201]: True

In [226]: np.allclose(np.einsum('ij,jkl->ikl', B.T, A), C.T)                       
Out[226]: True

Matching [201] with @ requires a further transpose:

In [225]: np.allclose((B.T@(A.transpose(1,0,2))).transpose(1,0,2), C.T)          
Out[225]: True

With einsum when can place the axes in any order, but with matmul, the order is fixed (batch, i, k)@(batch, k, l) -> (batch, i, l) (where the batch dimensions can be broadcast).

Your example might be easier if A had shape (2,10,9) and B (2,3), with C resulting in (9,10,3)

In [229]: A = np.random.rand(2,10,9); B = np.random.rand(2,3)                   
In [230]: C = A.T @ B                                                           
In [231]: C.shape                                                               
Out[231]: (9, 10, 3)
In [232]: C.T.shape                                                             
Out[232]: (3, 10, 9)

In [234]: ((B.T) @ (A.transpose(1,0,2))).shape                                    
Out[234]: (10, 3, 9)
In [235]: ((B.T) @ (A.transpose(1,0,2))).transpose(1,0,2).shape                   
Out[235]: (3, 10, 9)
In [236]: np.allclose(((B.T) @ (A.transpose(1,0,2))).transpose(1,0,2), C.T)        
Out[236]: True

Why does NumPy matrix multiply broadcast work in one direction but not in the transposed direction?

1 Answers1