1

Question

Please elaborate the answer in Numpy array broadcasting rules in 2012, and clarify what trailing axes are, as I am not sure which "linked documentation page" the answer refers to. Perhaps it has changed in the last 8 years.

As axes in trailing axes is plural, at least two last axes sizes must match (except sigular)? If so why at least two?

The given answer was:

Well, the meaning of trailing axes is explained on the linked documentation page. If you have two arrays with different dimensions number, say one 1x2x3 and other 2x3, then you compare only the trailing common dimensions, in this case 2x3. But if both your arrays are two-dimensional, then their corresponding sizes have to be either equal or one of them has to be 1.

In your case you have a 2x2 and 4x2 and 4 != 2 and neither 4 or 2 equals 1, so this doesn't work.

The error and the question raised were:

A = np.array([[1,2],[3,4]])
B = np.array([[2,3],[4,6],[6,9],[8,12]])
print("A.shape {}".format(A.shape))
print("B.shape {}".format(B.shape))
A*B
---
A.shape (2, 2)              # <---- The last axis size is 2 in both shapes.
B.shape (4, 2)              # <---- Apparently this "2" is not the size of trailing axis/axes

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-91-7a3f7e97944d> in <module>
      3 print("A.shape {}".format(A.shape))
      4 print("B.shape {}".format(B.shape))
----> 5 A*B

ValueError: operands could not be broadcast together with shapes (2,2) (4,2) 


Since both A and B have two columns, I would have thought this would work. 
So, I'm probably misunderstanding something here about the term "trailing axis", 
and how it applies to N-dimensional arrays.

References

The Broadcasting Rule
In order to broadcast, the size of the trailing axes for both arrays in an operation must either be the same size or one of them must be one.


Update

Understanding based on the reply by @Akshay Sehgal. Consider 2 arrays A.shape = (4,5,1) and B.shape = (1,2).

A = np.arange(20).reshape((4, 5, 1))
B = np.arange(2).reshape((1,2))
print("A.shape {}".format(A.shape))
print("B.shape {}".format(B.shape))
---
A.shape (4, 5, 1)
B.shape (1, 2)

Firstly, look at axis=-1 and the shape 01 in A is broadcast from 01 to 02, because it is singular, to match that of B. Then shape 01 in B for axis=-2 is broadcast from 01 (singular) to 05 to match that of A. The result is shape (4, 5, 2).

print("A * B shape is {}".format((A*B).shape))
---
A * B shape is (4, 5, 2)

Based on the answer from @hpaulj, a way to simulate broadcasting.

print("A.shape {}".format(A.shape))
print("B.shape {}".format(B.shape))
---
A.shape (4, 5, 1)
B.shape (1, 2)

# Check ranks.
print("rank(A) {} rank(B) {}".format(A.ndim, B.ndim))
---
rank(A) 3 rank(B) 2

# Expand B because rank(B) < rank(A).
B = B[
    None,
    ::
]
B.shape
---
(1, 1, 2)

A:(4,5,1)
   ↑ ↑ ↓
B:(1,1,2)
----------
C:(4,5,2)
mon
  • 18,789
  • 22
  • 112
  • 205

2 Answers2

1

Trailing axes are axis=-1, axis=-2, axis=-3 ... . Broadcasting rules compare trailing axes as opposed to leading axes (axis=0 onwards).

This is specifically for applying broadcasting to different dimensional tensors (say 2D and 3D tensors). Trailing axes basically indicates the direction in which axes are considered for broadcasting rules. Imagine lining up the axes by shape. If you lead with axes you would have something like the following -

Consider 2 arrays A.shape = (4,5,1) and B.shape = (1,2)

#Leading axes

A  04  05  01
B  01  02
--------------
No broadcasting
--------------

To consider trailing axes you would instead look at them as -

#Trailing axes

A  04  05  01
B      01  02
--------------
C  04  05  02
--------------

That's all they mean with the term trailing axes in this context i.e. start backward instead of leading axes.

In other words, when considering to broadcast a (1,2) shaped array with say a higher dimensional array, we look at the trailing axes which are shaped 2 for axis=-1 and then 1 for axis=-2 in the reverse order.

Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51
  • And because the rank of A (3) > rank of B(2), two trailing axes from B must match to apply broadcasting? – mon Dec 24 '20 at 08:49
  • exactly, match or be 1 for broadcasting to apply. `In order to broadcast, the size of the trailing axes for both arrays in an operation must either be the same size or one of them must be one.` Hope this clarifies your question. – Akshay Sehgal Dec 24 '20 at 08:56
  • 1
    Thanks a lot. Wish your explanation is in the Numpy document. – mon Dec 24 '20 at 09:01
  • Glad to help anytime. – Akshay Sehgal Dec 24 '20 at 09:02
1

The way I explain broadcasting puts less focus on trailing axes, and more on two rules:

  • match the number of dimensions by adding leading size 1 dimensions
  • scale all size 1 dimensions to match

In that example, paired down:

In [233]: A = np.arange(20).reshape((4, 5))
     ...: B = np.arange(2)
In [234]: A
Out[234]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])
In [235]: B
Out[235]: array([0, 1])
In [236]: A*B
Traceback (most recent call last):
  File "<ipython-input-236-47896efed660>", line 1, in <module>
    A*B
ValueError: operands could not be broadcast together with shapes (4,5) (2,) 

By the first rule the (2,) is expanded to (1,2), and possibly to (4,2), but that's a deadend.

But if we add a dimension to A, making it (4,5,1):

In [237]: A[:,:,None]*B
Out[237]: 
array([[[ 0,  0],
        [ 0,  1],
        [ 0,  2],
        ...
        [ 0, 19]]])
In [238]: _.shape
Out[238]: (4, 5, 2)

Now the (2,) expands to (1,1,2), which works with the (4,5,1)

Starting with (1,2) for B also works:

In [240]: (A[:,:,None]*B[None,:]).shape
Out[240]: (4, 5, 2)

It can add as many leading dimensions to B as needed, but it can't automatically add trailing dimensions to A. We have to do that ourselves. reshape works fine to add dimensions, but I think the None/newaxis idiom better highlights this addition.

This behavior could be explained in terms of trailing axes (doesn't have to be plural), but I think the two step explanation is clearer.

There are, I think, two reasons for the distinction between leading and trailing axes. Leading axes are outermost (at least for C order), and it avoids ambiguity.

Consider using (3,) and (2,) together. We could form (3,2) or (2,3) arrays from them, but which?

In [241]: np.array([1,2,3])*np.array([4,5])
Traceback (most recent call last):
  File "<ipython-input-241-eaf3e99b50a9>", line 1, in <module>
    np.array([1,2,3])*np.array([4,5])
ValueError: operands could not be broadcast together with shapes (3,) (2,) 

In [242]: np.array([1,2,3])[:,None]*np.array([4,5])
Out[242]: 
array([[ 4,  5],
       [ 8, 10],
       [12, 15]])

In [243]: np.array([1,2,3])*np.array([4,5])[:,None]
Out[243]: 
array([[ 4,  8, 12],
       [ 5, 10, 15]])

The explicit trailing None clearly identifies which we want. We could add a [None,:] but it isn't necessary.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks a lot for the way of thinking. Match the rank by adding leading dimensions makes perfect sense! – mon Dec 24 '20 at 21:41
  • Since the question was about trailing axes in broadcasting and I had accepted another answer, I cannot change but really appreciate the perspective on broadcasting. – mon Dec 24 '20 at 22:07