0

I am trying to use Parallel Python with Python 3.5 on Windows 10. I'm new to this, so please excuse the terminology.

I have installed Python and all of the required packages on each of the computers (nodes) and have been running a batch script on each of the nodes to make them visible to the root computer:

python ppserver.py -p 35000 -a -w 4 -s "secretword"

I'm going to explain the problem by trying to run a simple example that I found on the internet and edited that should find each of the nodes, however, only a select few are found. It is always the same nodes that are missing:

import math, time, sys, _thread
import pp
import numpy as np
import pandas as pd
import os

print("Begin")

# class for callbacks - This class is to allow the output of each node to be 
# safely combined
class Sum:
    def __init__(self):
        self.value = 0.0
        self.lock = _thread.allocate_lock()
        self.count = 0

    #the callback function
    def add(self, value):
        # we must use lock here because += is not atomic
        self.count += 1
        self.lock.acquire()
        self.value += value
        self.lock.release()

# This is the function that is sent to each node for independent analysis
def part_sum(start, end):
    """Calculates partial sum"""
    sum = 0
    for x in range(int(start), int(end)):
        if int(x) % 2 == 0:
           sum -= 1.0 / x
        else:
           sum += 1.0 / x
    return sum

# Control script - run by the master to control the analysis and each of the 
# nodes
print("""Usage: python callback.py [ncpus]
    [ncpus] - the number of workers to run in parallel, 
    if omitted it will be set to the number of processors in the system
    """)

start = 1
end = 200000000

# Divide the task into 128 subtasks
parts = 128
step = (end - start) / parts + 1

# tuple of all parallel python servers to connect with
ppservers = ("*:35000",) #find all available servers listening on port 35000!!!

if len(sys.argv) > 1:
    ncpus = int(sys.argv[1])

    # Creates jobserver with ncpus workers
    job_server = pp.Server(ncpus, ppservers=ppservers)
    print("Starting pp with", ncpus, "workers")
else:
    # Creates jobserver with automatically detected number of workers
    # also uses 2 local cpus!
    job_server = pp.Server(ppservers=ppservers,secret="secretword")
    #ncpus = job_server.get_ncpus() - 2
    print("Starting pp with auto discovery")

# Create an instance of callback class
sum = Sum()

# Execute the same task with different amount of active workers and measure the time
start_time = time.time()
for index in range(parts):
    starti = int(start+index*step)
    endi = int(min(start+(index+1)*step, end))
    # Submit a job which will calculate partial sum 
    # part_sum - the function
    # (starti, endi) - tuple with arguments for part_sum
    # callback=sum.add - callback function

    job_server.submit(part_sum, (starti, endi), callback=sum.add)

#wait for jobs in all groups to finish 
job_server.wait()

# Print the partial sum
print("Partial sum is", sum.value, "| diff =", math.log(2) - sum.value)

job_server.print_stats()

print("Done")
# Parallel Python Software: http://www.parallelpython.com

All computers are running Windows 10 and have the same version of Python and the used packages. The batch scripts are running. The computers are all on the same network. What are the possible reasons that I'm not seeing all of the nodes?

Thanks

jlt199
  • 2,349
  • 6
  • 23
  • 43
  • Maybe the nodes that aren't found have a software firewall enabled which blocks the connection? – John Zwinck May 24 '17 at 15:04
  • I've tried changing the setting of the firewall to let Python through, but it doesn't seem to make any difference. Also, the computers that are connecting have the same firewall and I haven't changed any settings on those. – jlt199 May 24 '17 at 15:11

1 Answers1

0

The problem I was having is that the computers that wouldn't connect were running VirtualBox. I uninstalled this and the computers were found no problem.

jlt199
  • 2,349
  • 6
  • 23
  • 43