To really get into broadcasting details you need to understand array shape and strides. But a lot of the work is now implemented in c
code using nditer
. You can read about it at http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html. np.nditer
gives you access to the tool at the Python level, but its real value comes when used with cython
or your own c
code.
np.lib.stride_tricks
has functions that let you play with strides. One of its functions helps visualize how arrays are broadcasted together. In practice the work is done with nditer
, but this function may help understand the action:
In [629]: np.lib.stride_tricks.broadcast_arrays(np.arange(6).reshape(2,3),
np.array([[1],[2]]))
Out[629]:
[array([[0, 1, 2],
[3, 4, 5]]),
array([[1, 1, 1],
[2, 2, 2]])]
Note that, effectively the 2nd array has been replicated to match the 1st's shape. But the replication is done with stride tricks, not with actual copies.
In [631]: A,B=np.lib.stride_tricks.broadcast_arrays(np.arange(6).reshape(2,3),
np.array([[1],[2]]))
In [632]: A.shape
Out[632]: (2, 3)
In [633]: A.strides
Out[633]: (12, 4)
In [634]: B.shape
Out[634]: (2, 3)
In [635]: B.strides
Out[635]: (4, 0)
It's this (4,0)
strides that does the replication without copy.
=================
Using the python level nditer
, here's what it does during broadcasting.
In [1]: A=np.arange(6).reshape(2,3)
In [2]: B=np.array([[1],[2]])
The plain nditer feeds elements one set at a time
http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#using-an-external-loop
In [5]: it =np.nditer((A,B))
In [6]: for a,b in it:
...: print(a,b)
0 1
1 1
2 1
3 2
4 2
5 2
But when I turn extenal_loop
on, it iterates in chunks, here respective rows of the broadcasted arrays:
In [7]: it =np.nditer((A,B), flags=['external_loop'])
In [8]: for a,b in it:
...: print(a,b)
[0 1 2] [1 1 1]
[3 4 5] [2 2 2]
With a more complex broadcasting the external_loop
still produces 1d arrays that allow simple c
style iteration:
In [13]: A1=np.arange(24).reshape(3,2,4)
In [18]: it =np.nditer((A1,np.arange(3)[:,None,None]), flags=['external_loop'])
In [19]: while not it.finished:
...: print(it[:])
...: it.iternext()
...:
(array([0, 1, 2, 3, 4, 5, 6, 7]), array([0, 0, 0, 0, 0, 0, 0, 0]))
(array([ 8, 9, 10, 11, 12, 13, 14, 15]), array([1, 1, 1, 1, 1, 1, 1, 1]))
(array([16, 17, 18, 19, 20, 21, 22, 23]), array([2, 2, 2, 2, 2, 2, 2, 2]))
Note that while A1
is (3,2,4), the nditer loop yields 3 steps (1st axis) with 2*4 length elements.
I found in another cython/nditer
SO question that the first approach did not produce much of a speed improvement, but the 2nd helped a lot. In c
or cython
the external_loop
case would do simple low level iteration.
===============
If I broadcast on the 1 and 3rd axis, the iterator takes 2*3 steps (effectively flattening the 1st 2 axes, and feeding the 3rd):
In [20]: it =np.nditer((A1,np.arange(2)[None,:,None]), flags=['external_loop'])
In [21]: while not it.finished:
...: print(it[:])
...: it.iternext()
...:
(array([0, 1, 2, 3]), array([0, 0, 0, 0]))
(array([4, 5, 6, 7]), array([1, 1, 1, 1]))
(array([ 8, 9, 10, 11]), array([0, 0, 0, 0]))
(array([12, 13, 14, 15]), array([1, 1, 1, 1]))
(array([16, 17, 18, 19]), array([0, 0, 0, 0]))
(array([20, 21, 22, 23]), array([1, 1, 1, 1]))
But with buffered
, it iterates once, feeding me 2 1d arrays:
In [22]: it =np.nditer((A1,np.arange(2)[None,:,None]), flags=['external_loop','buffered'])
In [23]: while not it.finished:
...: print(it[:])
...: it.iternext()
...:
(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]),
array([0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1]))
Does Cython offer any reasonably easy and efficient way to iterate Numpy arrays as if they were flat?
has some speed tests, showing that buffered external loop is fastest
cython
translates this into fast simple c
iteration:
for xarr in it:
x = xarr
size = x.shape[0]
for i in range(size):
x[i] = x[i]+1.0