0

I have used parallel computing before through MPI (and Fortran :)). I would like to use now the parallel capabilities of IPython.

My question is related to the poor performance of the following code, inspired by http://ipython.org/ipython-doc/dev/parallel/asyncresult.html:

from IPython.parallel import Client
import numpy as np

_procs = Client()
print 'engines #', len(_procs)
dv = _procs.direct_view()

X = np.linspace(0,100)

add = lambda a,b: a+b
sq = lambda x: x*x

%timeit reduce(add, map(sq, X))
%timeit reduce(add, dv.map(sq, X))

The results for one processor are:

10000 loops, best of 3: 43 µs per loop
100 loops, best of 3: 4.77 ms per loop

Could you tell me if the results seem normal to you and, if so, why there is such a huge difference in computational time?

Best regards, Flavien.

Flavien Lambert
  • 690
  • 1
  • 9
  • 22

1 Answers1

1

Parallel processing doesn't come for free. There is a cost associated with sending job items to the clients and receiving the results afterwards called overhead. Your original work takes 43 µs and that is just too short. You need to have way larger work items before parallel processing would become beneficial. A simple rule of a thumb would be that it should take each worker at least about 10 times the overhead in order to process its work items. Try with a vector of 1 million elements or even larger instead.

Hristo Iliev
  • 72,659
  • 12
  • 135
  • 186
  • Here are the results for different length of the array:
    length # 10000 100 loops, best of 3: 8.68 ms per loop 10 loops, best of 3: 61.5 ms per loop length # 100000 10 loops, best of 3: 89.3 ms per loop 1 loops, best of 3: 558 ms per loop length # 1000000 1 loops, best of 3: 911 ms per loop 1 loops, best of 3: 5.59 s per loop
    – Flavien Lambert Oct 07 '14 at 02:55
  • Probably marshalling and unmarshalling array items is taking way too long. Try doing much more work on each element than simply squaring them. – Hristo Iliev Oct 07 '14 at 08:46