TL;DR: I can't get the most basic dispy
sample code to run properly. Why not?
The details:
I'm trying to get into distributed processing in python, and thought the dispy library sounded interesting, due to the comprehensive feature set.
However, I've been trying to follow their basic canonical program example, and I'm getting nowhere.
- I've installed dispy (
python -m pip install dispy
) - I went on to another machine with the same subnet address and ran
python dispynode.py
. It seems to work, as I get the following output:2016-06-14 10:33:38 dispynode - dispynode version 4.6.14
2016-06-14 10:33:38 asyncoro - version 4.1 with epoll I/O notifier
2016-06-14 10:33:38 dispynode - serving 8 cpus at 10.0.48.54:51348Enter "quit" or "exit" to terminate dispynode, "stop" to stop
service, "start" to restart service, "cpus" to change CPUs used,
anything else to get status: - Back on my client machine, I run the sample code downloaded from http://dispy.sourceforge.net/_downloads/sample.py, copied here:
# function 'compute' is distributed and executed with arguments
# supplied with 'cluster.submit' below
def compute(n):
import time, socket
time.sleep(n)
host = socket.gethostname()
return (host, n)
if __name__ == '__main__':
# executed on client only; variables created below, including modules imported,
# are not available in job computations
import dispy, random
# distribute 'compute' to nodes; 'compute' does not have any dependencies (needed from client)
cluster = dispy.JobCluster(compute)
# run 'compute' with 20 random numbers on available CPUs
jobs = []
for i in range(20):
job = cluster.submit(random.randint(5,20))
job.id = i # associate an ID to identify jobs (if needed later)
jobs.append(job)
# cluster.wait() # waits until all jobs finish
for job in jobs:
host, n = job() # waits for job to finish and returns results
print('%s executed job %s at %s with %s' % (host, job.id, job.start_time, n))
# other fields of 'job' that may be useful:
# job.stdout, job.stderr, job.exception, job.ip_addr, job.end_time
cluster.print_status() # shows which nodes executed how many jobs etc.
When I run this (python sample.py
), it just hangs. Debugging through pdb, I see it eventually is hanging at dispy/__init__.py(117)__call__()
. The line reads self.finish.wait()
. finish is just a python thread, as wait()
then goes into lib/python3.5/threading.py(531)wait()
. It hangs once it hits the wait.
I've tried running dispynode on the client machine and gotten the same results. I've tried a lot of variants of passing nodes into the creation of the cluster, e.g:
cluster = dispy.JobCluster(compute, nodes=['localhost'])
cluster = dispy.JobCluster(compute, nodes=['*'])
cluster = dispy.JobCluster(compute, nodes=[<hostname of the remote node running the client>])
I've tried running with the cluster.wait()
line uncommented, and got the same results.
When I added logging (cluster = dispy.JobCluster(compute, loglevel = 10)
), I got the following output on the client side:
2016-06-14 10:27:01 asyncoro - version 4.1 with epoll I/O notifier
2016-06-14 10:27:01 dispy - dispy client at :51347 2016-06-14 10:27:01 dispy - Storing fault recovery information in "_dispy_20160614102701"
2016-06-14 10:27:01 dispy - Pending jobs: 0
2016-06-14 10:27:01 dispy - Pending jobs: 1
2016-06-14 10:27:01 dispy - Pending jobs: 2
2016-06-14 10:27:01 dispy - Pending jobs: 3
2016-06-14 10:27:01 dispy - Pending jobs: 4
2016-06-14 10:27:01 dispy - Pending jobs: 5
2016-06-14 10:27:01 dispy - Pending jobs: 6
2016-06-14 10:27:01 dispy - Pending jobs: 7
2016-06-14 10:27:01 dispy - Pending jobs: 8
2016-06-14 10:27:01 dispy - Pending jobs: 9
2016-06-14 10:27:01 dispy - Pending jobs: 10
This doesn't seem unexpected, but doesn't help me figure out why the jobs aren't running.
For what it's worth, here's _dispy_20160614102701.bak:
'_cluster', (0, 207)
'compute_1465918021755', (512, 85)
and similarly, _dispy_20160614102701.dir:
'_cluster', (0, 207)
'compute_1465918021755', (512, 85)
I'm out of guesses, unless I'm using an unstable version.