5

I'm developing code that integrates an ODE using scipy's complex_ode, where the integrand includes a Fourier transform and exponential operator acting on a large array of complex values.

To optimize performance, I've profiled this and found the main bottleneck is (after optimizing FFTs using PyFFTW etc) in the line:

val = np.exp(float_value * arr)

I'm currently using numpy which I understand calls C code - and thus should be quick. But is there any way to further improve performance please?

I've looked into using Numba but since my main loop includes FFTs too, I don't think it can be compiled (nopython=True flag leads to errors) and thus, I suspect it offers no gain.

Here is a test example for the code I'd like to optimize:

arr = np.random.rand(2**14) + 1j *np.random.rand(2**14)
float_value = 0.5
%timeit np.exp(float_value * arr)

Any suggestions welcomed thanks.

SLater01
  • 459
  • 1
  • 6
  • 17

1 Answers1

4

We could leverage numexpr module, which works really efficiently on large data involving transcendental operations -

In [91]: arr = np.random.rand(2**14) + 1j *np.random.rand(2**14)
    ...: float_value = 0.5
    ...: 

In [92]: %timeit np.exp(float_value * arr)
1000 loops, best of 3: 739 µs per loop

In [94]: import numexpr as ne

In [95]: %timeit ne.evaluate('exp(float_value*arr)')
1000 loops, best of 3: 241 µs per loop

This seems to be coherent with the expected performance as stated in the docs.

Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Thanks - this numexpr approach looks interesting. I hadn't seen this in the usual online guidance for optimization which typically highlights Numba, Cython etc. While standalone %timeits show a >2x improvement, when used in my integrand the solver performance shows no difference. I'll investigate more and post back / accept the answer. – SLater01 Oct 02 '17 at 13:34
  • @SLater01 Are you working with large data there? Also, when timing with the larger setup of the solver problem, make sure you are timing this exponential part only, because maybe some other parts of the code is taking a bigger share of the runtime. – Divakar Oct 02 '17 at 14:26
  • 1
    Usually only a small pickup, but you can also look at using the `out=` param of either of these functions to reuse temporary arrays. – chrisb Oct 02 '17 at 18:38
  • Yes, with explicit timing - it still shows almost no speed-up in the integrand, which is odd. Perhaps the overhead for many calls defeat the benefits? But I would have thought this would be apparent using %timeit too. Strange. – SLater01 Oct 03 '17 at 11:08