I'm doing a simple Monte Carlo simulation exercise, using ipcluster engines of IPython. I've noticed a huge difference in execution time based on how I define my function, and I'm asking the reason for this. Here are the details:
When I definde the task as below, it is fast:
def sample(n):
return (rand(n)**2 + rand(n)**2 <= 1).sum()
When run in parallel:
from IPython.parallel import Client
rc = Client()
v = rc[:]
with v.sync_imports():
from numpy.random import rand
n = 1000000
timeit -r 1 -n 1 print 4.* sum(v.map_sync(sample, [n]*len(v))) / (n*len(v))
3.141712
1 loops, best of 1: 53.4 ms per loop
But if I change the function to:
def sample(n):
return sum(rand(n)**2 + rand(n)**2 <= 1)
I get:
3.141232
1 loops, best of 1: 3.81 s per loop
...which is 71 time slower. What can be the reason for this?