I am trying to use Parallel Python with Python 3.5 on Windows 10. I'm new to this, so please excuse the terminology.
I have installed Python and all of the required packages on each of the computers (nodes) and have been running a batch script on each of the nodes to make them visible to the root computer:
python ppserver.py -p 35000 -a -w 4 -s "secretword"
I'm going to explain the problem by trying to run a simple example that I found on the internet and edited that should find each of the nodes, however, only a select few are found. It is always the same nodes that are missing:
import math, time, sys, _thread
import pp
import numpy as np
import pandas as pd
import os
print("Begin")
# class for callbacks - This class is to allow the output of each node to be
# safely combined
class Sum:
def __init__(self):
self.value = 0.0
self.lock = _thread.allocate_lock()
self.count = 0
#the callback function
def add(self, value):
# we must use lock here because += is not atomic
self.count += 1
self.lock.acquire()
self.value += value
self.lock.release()
# This is the function that is sent to each node for independent analysis
def part_sum(start, end):
"""Calculates partial sum"""
sum = 0
for x in range(int(start), int(end)):
if int(x) % 2 == 0:
sum -= 1.0 / x
else:
sum += 1.0 / x
return sum
# Control script - run by the master to control the analysis and each of the
# nodes
print("""Usage: python callback.py [ncpus]
[ncpus] - the number of workers to run in parallel,
if omitted it will be set to the number of processors in the system
""")
start = 1
end = 200000000
# Divide the task into 128 subtasks
parts = 128
step = (end - start) / parts + 1
# tuple of all parallel python servers to connect with
ppservers = ("*:35000",) #find all available servers listening on port 35000!!!
if len(sys.argv) > 1:
ncpus = int(sys.argv[1])
# Creates jobserver with ncpus workers
job_server = pp.Server(ncpus, ppservers=ppservers)
print("Starting pp with", ncpus, "workers")
else:
# Creates jobserver with automatically detected number of workers
# also uses 2 local cpus!
job_server = pp.Server(ppservers=ppservers,secret="secretword")
#ncpus = job_server.get_ncpus() - 2
print("Starting pp with auto discovery")
# Create an instance of callback class
sum = Sum()
# Execute the same task with different amount of active workers and measure the time
start_time = time.time()
for index in range(parts):
starti = int(start+index*step)
endi = int(min(start+(index+1)*step, end))
# Submit a job which will calculate partial sum
# part_sum - the function
# (starti, endi) - tuple with arguments for part_sum
# callback=sum.add - callback function
job_server.submit(part_sum, (starti, endi), callback=sum.add)
#wait for jobs in all groups to finish
job_server.wait()
# Print the partial sum
print("Partial sum is", sum.value, "| diff =", math.log(2) - sum.value)
job_server.print_stats()
print("Done")
# Parallel Python Software: http://www.parallelpython.com
All computers are running Windows 10 and have the same version of Python and the used packages. The batch scripts are running. The computers are all on the same network. What are the possible reasons that I'm not seeing all of the nodes?
Thanks