3

This is a rather simple operation, but it is repeated millions of times in my actual code and, if possible, I'd like to improve its performance.

import numpy as np

# Initial data array
xx = np.random.uniform(0., 1., (3, 14, 1))
# Coefficients used to modify 'xx'
a, b, c = np.random.uniform(0., 1., 3)

# Operation on 'xx' to obtain the final array 'yy'
yy = xx[0] * a * b + xx[1] * b + xx[2] * c

The last line is the one I'd like to improve. Basically, each term in xx is multiplied by a factor (given by the a, b, c coefficients) and then all terms are added to give a final yy array with the shape (14, 1) vs the shape of the initial xx array (3, 14, 1).

Is it possible to do this via numpy broadcasting?

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
Gabriel
  • 40,504
  • 73
  • 230
  • 404
  • 1
    summing the values gives just one result for the axis, so yy.shape would be (1,14,1) – f5r5e5d Mar 04 '18 at 20:51
  • You are correct @f5r5e5d, I meant the same shape as in maintaining the (14, 1) shape. I'll fix this now. – Gabriel Mar 04 '18 at 20:53
  • [einsum](https://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html#numpy.einsum) is supposed to be very fast – f5r5e5d Mar 04 '18 at 21:45

2 Answers2

3

We could use broadcasted multiplication and then sum along the first axis for the first alternative.

As the second one, we could also bring in matrix-multiplication with np.dot. Thus, giving us two more approaches. Here's the timings for the sample provided in the question -

# Original one
In [81]: %timeit xx[0] * a * b + xx[1] * b + xx[2] * c
100000 loops, best of 3: 5.04 µs per loop

# Proposed alternative #1
In [82]: %timeit (xx *np.array([a*b,b,c])[:,None,None]).sum(0)
100000 loops, best of 3: 4.44 µs per loop

# Proposed alternative #2
In [83]: %timeit np.array([a*b,b,c]).dot(xx[...,0])[:,None]
1000000 loops, best of 3: 1.51 µs per loop
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • 1
    This approach is the fastest of all, ~2.5X faster than my original code. Once again, thank you Divakar! – Gabriel Mar 04 '18 at 21:23
2

This is similar to Divakar's answer. Swap the first and the third axis of xx and do dot product.

import numpy as np

# Initial data array
xx = np.random.uniform(0., 1., (3, 14, 1))
# Coefficients used to modify 'xx'
a, b, c = np.random.uniform(0., 1., 3)

def op():
    yy = xx[0] * a * b + xx[1] * b + xx[2] * c
    return yy

def tai():
    d = np.array([a*b, b, c])
    return np.swapaxes(np.swapaxes(xx, 0, 2).dot(d), 0, 1)

def Divakar():
    # improvement given by Divakar
    np.array([a*b,b,c]).dot(xx.swapaxes(0,1))

%timeit op()
7.21 µs ± 222 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit tai()
4.06 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit Divakar()
3 µs ± 105 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Tai
  • 7,684
  • 3
  • 29
  • 49
  • I was not aware of the `np.swapaxes()` function, thank you. This method is faster than my original approach, but still almost 2X slower than Divakar's approach. – Gabriel Mar 04 '18 at 21:21
  • @Gabriel that's true. Divakar's method is fast :P – Tai Mar 04 '18 at 21:23
  • 2
    @Gabriel `np.array([a*b,b,c]).dot(xx.swapaxes(0,1))` could bring the performance of this solution closer to mine. – Divakar Mar 04 '18 at 21:30
  • @Divakar indeed it does. Yours is still faster though. – Gabriel Mar 04 '18 at 21:38