3

I am currently trying to optimise a program. The major bottlenecks are actually fairly simple one-line calculations operating on numpy arrays, eg:

(p-1) * c**(p-1)/(v_dt+c)**p

(p & c here are floats and v_dt a ~500 long numpy array of float)

This calculation takes around 1/50 of a second on my machine

(Timed using timeit: 1000 loops, best of 3: 21.8 msec per loop)

The problem is that this small function (and I have several others like it) is called some 500 times for each iteration in a loop that runs variably around 100 times. So this one little line suddenly adds 20 minutes on to my runtime.

What are the best ways of speeding up mathematical calculations in python? How much can be done with python tricks? I have looked into c_types and possibly Cython but how can I use these? Do I need to write c code for these bottleneck functions or can I employ already compiled libraries (I have no experience with c).

Many thanks.

Edit: I forgot to mention, I am already looking at parallelisation options for the loops but still want to speed up these bottleneck functions directly as this is performance critical code

BJH
  • 443
  • 1
  • 5
  • 11
  • 4
    I would recommend looking for macro-optimizations before diving into micro-optimizations: are you sure the program needs to loop that many times? Does the nested loop need to perform this calculation every time? Can some of these calculations be saved for later reference sometimes? – TigerhawkT3 Jul 30 '15 at 18:11
  • 3
    Maybe try it here http://codereview.stackexchange.com/ – Mihai Jul 30 '15 at 18:11
  • When you call a function to solve this equation, which variables are expected to be changing? I'm hinting that perhaps if only `v_dt` is changing, you could calculate and re-use the `c**(p-1)` component. – mike.k Jul 30 '15 at 18:12
  • 1
    The best way to speed up is to write more efficient code. Rethink the calculations, precalculate constants, use lookup tables, break the long calculations to stages. If nothing is helping - reconsider your tools. – Eugene Sh. Jul 30 '15 at 18:15
  • @Mihai Thank you, I wasn't aware of that site, I'll post there – BJH Jul 30 '15 at 18:16
  • @TigerhawkT3 That's a fair comment but I am sure the calculation needs to be performed that many times. I have actually got rid of the actual for loop by creating a partial function and using multiprocessing.Pool.map() which sped things up a lot. But the calculation still ends up as a major bottleneck. The outer loop is implicit as I am performing non-linear constrained minisation on the wider functions which tends to iterate 100+ times – BJH Jul 30 '15 at 18:19
  • 5
    @BJH Keep in mind that Code Review requires the full, **working** code for a question to be accepted. This snippet is `example code` and will likely not be received well. If you are posting the entire scenario there (including use-case) then we greatly welcome it. :) – Der Kommissar Jul 30 '15 at 18:20
  • @mike.k I know, but annoyingly both c and v_dt vary with each iteration. – BJH Jul 30 '15 at 18:22
  • 2
    Exponentiation is generally expensive. Rather than `(p-1) * c**(p-1)/(v_dt+c)**p`, see if `(p-1) * (c/(v_dt+c))**p / c` is any faster. –  Jul 30 '15 at 18:24
  • 3
    It's hard to suggest optimizations when you haven't provided any context for that line of code. Which of those parameters vary from call to call? You may be able to pre-compute some of that expression. What do you use the result for? For all we know, there may be ways to skip the computation of that intermediate array altogether. – ali_m Jul 30 '15 at 18:24
  • Generally for `numpy` questions, SO is better than CR. There are a lot more `numpy` knowledgeable posters on SO. – hpaulj Jul 30 '15 at 18:53

2 Answers2

0

In my naive tests this doesn't look all that expensive:

In [65]: p,c =2.,2.
In [66]: v_dt=np.ones(500)*1.5
In [67]: x=(p-1)*c**(p-1)/(v_dt+c)**p
In [68]: timeit x=(p-1)*c**(p-1)/(v_dt+c)**p
10000 loops, best of 3: 23.5 µs per loop

a little more expensive with different p and c

In [77]: p,c =2.123,1.324
In [78]: timeit x=(p-1)*c**(p-1)/(v_dt+c)**p
10000 loops, best of 3: 95.9 µs per loop

most of the time is in the vector exponentiation:

In [82]: %timeit v_dt**p
10000 loops, best of 3: 75.5 µs per loop

(this is on a Centron laptop of Windows7 vintage).

This isn't the kind of calculation that cython or other do-it-yourself compiled code can do better. numpy is already tuned to perform math like this efficiently.

I think you need to look at the bigger picture. Why does this need to be called so often? Can you call it fewer times with larger arrays?

hpaulj
  • 221,503
  • 14
  • 230
  • 353
0

I thought this article was surprising and interesting.

A quick summary:

  • Most interesting is that for small arrays (<150 elements) he found that Python was actually faster than Numpy. Less overhead I guess.

  • You could also write your inner loop in C++ and just call it through Python.

  • You could look into Numba, which seems like a very easy way to speed up simple calculations.

Finally, I've gotten speedups by reorganizing functions so that the vector-part is only touched once.

As an example, instead of (a * (b * (c * vector))) which does 3 vector-multiplications, you could do (a * b * c) * vector, which does one.

Sam Bobel
  • 1,784
  • 1
  • 15
  • 26