I'm currently performing computation of the factorial of 10 random numbers using dispy, which "distributes" the tasks to various nodes. However, if one of the computation is of the factorial of a large number let's say factorial(100), then if the that task takes a very long time, yet dispy runs it only on a single node.
How do I make sure that dispy breaks down and distributes this task to other nodes, so that it doesn't take so much time?
Here's the code that I have come up with so far, where the factorial of 10 random numbers is calculated and the 5th computation is always of factorial(100) :-
# 'compute' is distributed to each node running 'dispynode'
def compute(n):
import time, socket
ans = 1
for i in range(1,n+1):
ans = ans * i
time.sleep(n)
host = socket.gethostname()
return (host, n,ans)
if __name__ == '__main__':
import dispy, random
cluster = dispy.JobCluster(compute)
jobs = []
for i in range(10):
# schedule execution of 'compute' on a node (running 'dispynode')
# with a parameter (random number in this case)
if(i==5):
job = cluster.submit(100)
else:
job = cluster.submit(random.randint(5,20))
job.id = i # optionally associate an ID to job (if needed later)
jobs.append(job)
# cluster.wait() # waits for all scheduled jobs to finish
for job in jobs:
host, n, ans = job() # waits for job to finish and returns results
print('%s executed job %s at %s with %s as input and %s as output' % (host, job.id, job.start_time, n,ans))
# other fields of 'job' that may be useful:
# print(job.stdout, job.stderr, job.exception, job.ip_addr, job.start_time, job.end_time)
cluster.print_status()