Increase performance of script (current method of using np.putmask)

Question

So I was wondering if there was a quicker method than this for applying two equations on an array. d84, slope, q_dis, recking_parameter are all float arrays of 3000 by 3000.

# Work out the equation for over 100
depth_100 = 1 / (3.2 * np.power((9.81 * slope), 0.3) * np.power(d84, 3 * 0.3 - 1) * np.power(q_dis, (-2 * 0.3)))

# Work out the equation for under 100
depth_1 = 1 / (1.6 * np.power((9.81 * slope), 0.23) * np.power(d84, 3 * 0.23 - 1) * np.power(q_dis, (-2 * 0.23)))

depth_fin = np.zeros_like(slope, dtype = float)

# Use np.putmask to place the calculated values into the array based on the conditional.
np.putmask(depth_fin, recking_parameter >= 100, depth_100)
np.putmask(depth_fin, recking_parameter < 100, depth_1)

This question would be better suited for http://codereview.stackexchange.com/ you should consider reposting it there. It is also not clear what `d84`, `slope` and `q_dis` are. — ebarr, Aug 06 '14 at 08:58
Great ok I will post on there too. I have clarified what they are at the top now. Hope that helps. Thanks. — Nick Jones, Aug 06 '14 at 09:11
@ebarr: `np.zeros_like` accepts a `dtype` argument. It overrides the default of using the dtype of the first argument. — Warren Weckesser, Aug 06 '14 at 13:08
@warrenweckesser True, for newer versions of numpy. For version 1.5.1 (and I assume earlier) `zeros_like` and `empty_like` will not take `dtype` arguments. — ebarr, Aug 07 '14 at 01:39

score 4 · Accepted Answer · 2014-08-08T09:38:25.487

In your question title you've put emphasis on the use of np.putmask, but in fact the most time is spent in the arithmetic. Taking the powers is computationally expensive, but you could still improve the run time by avoiding temporary arrays. You can use inplace operators (like in @Davidmh's answer), you could also use the numexpr module, for example:

import numexpr

depth_100 = '1 / (3.2 * (9.81 * slope)**0.3 * d84**-0.1 * q_dis**-0.6)'
depth_1 = '1 / (1.6 * (9.81 * slope)**0.23 * d84**-0.31 * q_dis**-0.46)'
depth_fin = 'where(recking_parameter < 100, '+depth_1+', '+depth_100+')'
depth_fin = numexpr.evaluate(depth_fin)

The nice thing about numexpr is that it'll also use multiple cores. In my testing on a dualcore it's roughly 4 times faster than the original code, but there's potentially an even larger speedup to be gained depending on the CPU you have.

Numexpr does a lot of good magic, like optimising the cache and parallelising. This is certainly the best way. — Davidmh, Aug 06 '14 at 23:57
numexpr is certainly the easiest way to go, but as the power is by far the most expensive part of this code you can also get decent speedup by using python threads to parallelize this (numpy releases the GIL), but you needs to do the blocking yourself. — jtaylor, Aug 07 '14 at 00:27

score 2 · Answer 2 · edited May 23 '17 at 11:52

You can get a ~14% improvement by pre-calculating the 9.81*slope array and switching out the zeros_like call with an empty_like call (I've compacted the code here just so it fits on the page):

slope_pre = 9.81 * slope
depth_100 = 1 / (3.2 * slope_pre**0.3 * d84**-0.1 * q_dis**-0.6)
depth_1 = 1 / (1.6 * slope_pre**0.23 * d84**-0.31 * q_dis**-0.46)
depth_fin = np.empty_like(slope)
np.putmask(depth_fin, recking_parameter >= 100, depth_100)
np.putmask(depth_fin, recking_parameter < 100, depth_1)

Edit:

Another option would be to investigate using the Numba jit compiler to try and get a compiled solution. Admittedly, I have tried this an not gotten very far, but it is clear from question such as Numba code slower than pure python that it is possible to get large speed ups on simple calculations like this.

Numba is amazing, but I find it still too experimental to be useful. They have too many "basic" bugs here and there. — Davidmh, Aug 07 '14 at 09:54
@Davidmh I have only had a brief play around with it and when you get it right, it's awesome. Not always clear how to get it right though... — ebarr, Aug 07 '14 at 10:02

score 2 · Answer 3 · answered Aug 07 '14 at 00:06

If you cannot depend on numexpr, as indicated by moarningsun, you can make the computations in place. So, for example:

pre_slope = 9.81 * slope
depth_100 = 1 / (3.2 * pre_slope**0.3 * d84**-0.1 * q_dis**-0.6)

is making a temporary copy of pre_slope**0.3, another of d84**-0.1 and so on, and then creating a brand new copy (and discarding as needed) for each operation. This is a lot of memory requirements.

It can be avoided:

depth_100 = d841**0.1
depth_100 *= q_dis**0.6
depth_100 /= pre_slope**0.3
depth_100 *= 1/3.2    # Note that multiplying is faster than dividing.

Now I need less temporal copies. And for depth1 you can make it even better if you don't need to save all the other arrays:

d84 **=0.31
q_dis **=0.46
pre_slope **= 0.23

depth_1 = d84
depth_1 *= q_dis
depth_1 /= pre_slope
depth_1 *= 1/1.6

Yeh I do not think I will be able to depend on numpexr. All brilliant answers and really useful information for a newbie programmer like me! — Nick Jones, Aug 08 '14 at 08:48

Increase performance of script (current method of using np.putmask)

3 Answers3