2

TL;DR: I can't get the most basic dispy sample code to run properly. Why not?

The details:

I'm trying to get into distributed processing in python, and thought the dispy library sounded interesting, due to the comprehensive feature set.

However, I've been trying to follow their basic canonical program example, and I'm getting nowhere.

  • I've installed dispy (python -m pip install dispy)
  • I went on to another machine with the same subnet address and ran python dispynode.py. It seems to work, as I get the following output:

    2016-06-14 10:33:38 dispynode - dispynode version 4.6.14
    2016-06-14 10:33:38 asyncoro - version 4.1 with epoll I/O notifier
    2016-06-14 10:33:38 dispynode - serving 8 cpus at 10.0.48.54:51348

    Enter "quit" or "exit" to terminate dispynode, "stop" to stop
    service, "start" to restart service, "cpus" to change CPUs used,
    anything else to get status:

  • Back on my client machine, I run the sample code downloaded from http://dispy.sourceforge.net/_downloads/sample.py, copied here:


# function 'compute' is distributed and executed with arguments
# supplied with 'cluster.submit' below
def compute(n):
    import time, socket
    time.sleep(n)
    host = socket.gethostname()
    return (host, n)

if __name__ == '__main__':
    # executed on client only; variables created below, including modules imported,
    # are not available in job computations
    import dispy, random
    # distribute 'compute' to nodes; 'compute' does not have any dependencies (needed from client)
    cluster = dispy.JobCluster(compute)
    # run 'compute' with 20 random numbers on available CPUs
    jobs = []
    for i in range(20):
        job = cluster.submit(random.randint(5,20))
        job.id = i # associate an ID to identify jobs (if needed later)
        jobs.append(job)
    # cluster.wait() # waits until all jobs finish
    for job in jobs:
        host, n = job() # waits for job to finish and returns results
        print('%s executed job %s at %s with %s' % (host, job.id, job.start_time, n))
        # other fields of 'job' that may be useful:
        # job.stdout, job.stderr, job.exception, job.ip_addr, job.end_time
    cluster.print_status()  # shows which nodes executed how many jobs etc.

When I run this (python sample.py), it just hangs. Debugging through pdb, I see it eventually is hanging at dispy/__init__.py(117)__call__(). The line reads self.finish.wait(). finish is just a python thread, as wait() then goes into lib/python3.5/threading.py(531)wait(). It hangs once it hits the wait.

I've tried running dispynode on the client machine and gotten the same results. I've tried a lot of variants of passing nodes into the creation of the cluster, e.g:

cluster = dispy.JobCluster(compute, nodes=['localhost'])
cluster = dispy.JobCluster(compute, nodes=['*'])
cluster = dispy.JobCluster(compute, nodes=[<hostname of the remote node running the client>])

I've tried running with the cluster.wait() line uncommented, and got the same results.

When I added logging (cluster = dispy.JobCluster(compute, loglevel = 10)), I got the following output on the client side:

2016-06-14 10:27:01 asyncoro - version 4.1 with epoll I/O notifier
2016-06-14 10:27:01 dispy - dispy client at :51347 2016-06-14 10:27:01 dispy - Storing fault recovery information in "_dispy_20160614102701"
2016-06-14 10:27:01 dispy - Pending jobs: 0
2016-06-14 10:27:01 dispy - Pending jobs: 1
2016-06-14 10:27:01 dispy - Pending jobs: 2
2016-06-14 10:27:01 dispy - Pending jobs: 3
2016-06-14 10:27:01 dispy - Pending jobs: 4
2016-06-14 10:27:01 dispy - Pending jobs: 5
2016-06-14 10:27:01 dispy - Pending jobs: 6
2016-06-14 10:27:01 dispy - Pending jobs: 7
2016-06-14 10:27:01 dispy - Pending jobs: 8
2016-06-14 10:27:01 dispy - Pending jobs: 9
2016-06-14 10:27:01 dispy - Pending jobs: 10

This doesn't seem unexpected, but doesn't help me figure out why the jobs aren't running.

For what it's worth, here's _dispy_20160614102701.bak:

'_cluster', (0, 207)
'compute_1465918021755', (512, 85)

and similarly, _dispy_20160614102701.dir:

'_cluster', (0, 207)
'compute_1465918021755', (512, 85)

I'm out of guesses, unless I'm using an unstable version.

Scott Mermelstein
  • 15,174
  • 4
  • 48
  • 76
  • I also have this type of problem. I would like to know if there is a solution to this problem? – lskrinjar Jul 01 '16 at 07:19
  • 1
    I haven't found one. In fact, I gave up on dispy, so I didn't even bother putting a bounty on this. I also tried [scoop](https://github.com/soravux/scoop), which on the surface does exactly what I need, but it had a very weird, [arbitrary limit for the maximum number of processors I could usefully add](https://groups.google.com/forum/#!topic/scoop-users/WlmqPzlsdec). I've given up, and decided to use basic popen of ssh, and write my own scheduler. – Scott Mermelstein Jul 01 '16 at 13:42
  • @ThomasGuenet You made a suggested edit that I'm going to reject. The edit is inappropriate, because you're changing what I actually said I did. I did run `python dispy.py`, not just `dispy.py`. There is a difference in how they run, in that your way is as a module. That difference _might_ be why the program was hanging. So your edit is inappropriate, but it may make a good answer. Write it up as an answer, showing how running just `dispy.py` instead of `python dispy.py` would fix the problem. If you show it convincingly, you'll have answered this question. – Scott Mermelstein Jan 05 '17 at 16:18
  • @ScottMermelstein I suggested you change `python dispynode.py` executed from shell to `dispynode.py` directly : this commande makes it possible to launch jobs on the node. I have been using dispy yesterday and was experiencing the same issue and solve it today on my computer. I just don't know if it's the same issue you have. I will post it. – ThomasGuenet Jan 06 '17 at 09:25

4 Answers4

1

When first setting up and using dispy on a network, I found that I had to specify the client node IP when creating the job cluster, see below:

cluster = dispy.JobCluster(compute, ip_addr=your_ip_address_here)

See if that helps.

Dave
  • 11
  • 1
0

If you're just running sample.py on your client, change the following in your main statement:

cluster = dispy.JobCluster(compute, nodes=['nodeip_1','nodeip_2',.....,'nodeip_n])

Then run it in your IDE, or via shell.

I hope that helps.

  • Thanks for your answer. I had tried `nodes=['nodename']` before, and it didn't work. Based on your suggestion, I tried `nodes=['nodeip']`, and it still hangs. For some reason, it's never communicating with the client. – Scott Mermelstein Jun 14 '16 at 19:11
  • If your cluster is on the same local network. On the node try launching the dispynode script this way. python dispynode.py -i pcname(or ip address) Then run the script as I described in the previous comment above. – user6466166 Jun 14 '16 at 19:55
  • Using either of those gives me `OSError: [Errno 99] Cannot assign requested address` (in line 252 of dispynode.py: slf.tcp_sock.bind((ip_addr, node_port)). – Scott Mermelstein Jun 14 '16 at 20:02
0

Before executing python sample.py, dispynode.py should still be running on the localhost or the other machine (notice the other machine should be in the same network if you do not want to specify complex options).

I was experiencing the same issue and solved it this way :

  • open a terminal and execute : $ dispynode.py (do not terminate it)
  • open a second terminal and execute : $ python sample.py

Do not forget function compute consists in waiting a certain time, outputs should appear at least 20 seconds after executing sample.py.

ThomasGuenet
  • 183
  • 1
  • 7
  • 17
  • Well, it was worth a shot, but seems to not matter whether I used `python dispynode.py` or just `dispynode.py`. I get the same result with my client - it hangs in the wait() condition. I tried without setting nodes on the cluster, and with setting nodes to both ['hostname'] and ['hostip']. In all cases, I get the same results with `dispynode.py` as I did with `python dispynode.py`. – Scott Mermelstein Jan 06 '17 at 15:46
0

try this instead

python /home/$user_name/.local/lib/python3.9/site-packages/dispy/dispynode.py
python sample.py

It worked for me

MD Mushfirat Mohaimin
  • 1,966
  • 3
  • 10
  • 22
mr-suroot
  • 3
  • 1