NumPy broadcasting to improve dot-product performance

Question

This is a rather simple operation, but it is repeated millions of times in my actual code and, if possible, I'd like to improve its performance.

import numpy as np

# Initial data array
xx = np.random.uniform(0., 1., (3, 14, 1))
# Coefficients used to modify 'xx'
a, b, c = np.random.uniform(0., 1., 3)

# Operation on 'xx' to obtain the final array 'yy'
yy = xx[0] * a * b + xx[1] * b + xx[2] * c

The last line is the one I'd like to improve. Basically, each term in xx is multiplied by a factor (given by the a, b, c coefficients) and then all terms are added to give a final yy array with the shape (14, 1) vs the shape of the initial xx array (3, 14, 1).

Is it possible to do this via numpy broadcasting?

summing the values gives just one result for the axis, so yy.shape would be (1,14,1) — f5r5e5d, Mar 04 '18 at 20:51
You are correct @f5r5e5d, I meant the same shape as in maintaining the (14, 1) shape. I'll fix this now. — Gabriel, Mar 04 '18 at 20:53
[einsum](https://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html#numpy.einsum) is supposed to be very fast — f5r5e5d, Mar 04 '18 at 21:45

score 3 · Accepted Answer · answered Mar 04 '18 at 21:03

We could use broadcasted multiplication and then sum along the first axis for the first alternative.

As the second one, we could also bring in matrix-multiplication with np.dot. Thus, giving us two more approaches. Here's the timings for the sample provided in the question -

# Original one
In [81]: %timeit xx[0] * a * b + xx[1] * b + xx[2] * c
100000 loops, best of 3: 5.04 µs per loop

# Proposed alternative #1
In [82]: %timeit (xx *np.array([a*b,b,c])[:,None,None]).sum(0)
100000 loops, best of 3: 4.44 µs per loop

# Proposed alternative #2
In [83]: %timeit np.array([a*b,b,c]).dot(xx[...,0])[:,None]
1000000 loops, best of 3: 1.51 µs per loop

This approach is the fastest of all, ~2.5X faster than my original code. Once again, thank you Divakar! — Gabriel, Mar 04 '18 at 21:23

Tai · Answer 2 · 2018-03-04T21:49:15.387

2

This is similar to Divakar's answer. Swap the first and the third axis of xx and do dot product.

import numpy as np

# Initial data array
xx = np.random.uniform(0., 1., (3, 14, 1))
# Coefficients used to modify 'xx'
a, b, c = np.random.uniform(0., 1., 3)

def op():
    yy = xx[0] * a * b + xx[1] * b + xx[2] * c
    return yy

def tai():
    d = np.array([a*b, b, c])
    return np.swapaxes(np.swapaxes(xx, 0, 2).dot(d), 0, 1)

def Divakar():
    # improvement given by Divakar
    np.array([a*b,b,c]).dot(xx.swapaxes(0,1))

%timeit op()
7.21 µs ± 222 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit tai()
4.06 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit Divakar()
3 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

edited Mar 04 '18 at 21:49

answered Mar 04 '18 at 21:14

Tai

7,684
3
29
49

I was not aware of the `np.swapaxes()` function, thank you. This method is faster than my original approach, but still almost 2X slower than Divakar's approach. – Gabriel Mar 04 '18 at 21:21
@Gabriel that's true. Divakar's method is fast :P – Tai Mar 04 '18 at 21:23
2

@Gabriel `np.array([a*b,b,c]).dot(xx.swapaxes(0,1))` could bring the performance of this solution closer to mine. – Divakar Mar 04 '18 at 21:30
@Divakar indeed it does. Yours is still faster though. – Gabriel Mar 04 '18 at 21:38

NumPy broadcasting to improve dot-product performance

2 Answers2