I have an existing bit of Python code that runs in parallel across the cores in my machine. The job it completing is basically open an input file, read the contents, perform some fairly heavy maths, write the results to an output a file, take the next file in the for loop and do it again. To make this parallel across many cores I make use of the Pool
function in the multiprocessing
library. As a quick example:
import multiprocessing
import time
data = (
['a', '2'], ['b', '4'], ['c', '6'], ['d', '8'],
['e', '1'], ['f', '3'], ['g', '5'], ['h', '7']
)
def mp_worker((inputs, the_time)):
print " Processs %s\tWaiting %s seconds" % (inputs, the_time)
time.sleep(int(the_time))
print " Process %s\tDONE" % inputs
def mp_handler():
p = multiprocessing.Pool(8)
p.map(mp_worker, data)
if __name__ == '__main__':
mp_handler()
This example is just used to show how I've implemented the multiprocessing.Pool
function across 8 cores. In essence the mp_worker
function in my code is much more complex but you get my drift.
I've come to realise that the network I'm working on has several machines sitting idle for 99% of their time. I therefore wondered if there is a way to make use of their cores as well as my local cores in this code.
In pseudo code the code could become something like:
def mp_handler():
p = multiprocessing.Pool(servers=['local host', 192.168.0.1, 192.168.0.2], ncores=[8,8,4])
p.map(mp_worker, data)
Where I can now specify both my local machine and other IP addresses as severs together with the number of cores I'd like to use on each machine.
Since the other machines on my network are owned by me and are not internet connected, I'm not fussed about using SSH for security purposes.
Googling around I've noticed that the pathos
and scoop
libraries may be able to help me with this. It looks like pathos
has very similar commands to the multiprocessing
library which really appeals to me. However, in both cases I can't find a simple example showing me how to convert my local parallel job into a distributed parallel job. I'm keen to stay as close to the Pool/map functionality of the multiprocessing
library as possible.
Any help or examples would be much appreciated!