I have a genetic algorithm whose fitness function is a cumbersome simulation environment. The code is absolutely CPU-bound, runs on 64bit python 3.3, so I have implemented multiprocessing.starmap_async
to parallelize.
And it works very well, large increases in efficiency over serial. On my processor,Intel i7 CPU @ 2.40 GHz (with 16GB RAM): I notice run-times of 8 to 9 seconds with 4 processes (and slower for 2 processes, and even slower for serial).
HOWEVER, this only utilizes 65 to 73% of my processor.
Increasing the process count to 6 utilizes 95% to 100% of the processor, but with a runtime of 11 seconds. Memory is still sitting around 20%
Increase this count to 8 and the processor sits constantly at 100%, but runtime is now 12 seconds. Memory is just fine.
I can't post everything, but below is the multiprocessing call (with the arguments removed). Is there anything I can do to utilize more of the processor without a slow-down? I'd also appreciate any help understanding why this phenomena is occurring.
Multiprocessing call:
step = np.int8(pop_size/4)
pol = Pool(processes=4)
res = pol.starmap_async(SimWorker, ((i, i+step,pop,"a bunch of other arguments") for i in range(0, pop_size, step)))
fitnessdict = res.get()
fitnessdict = np.asarray(fitnessdict)
for i in range(0,pop_size,step):
for p in range(i,i+step):
fitness[p] = fitnessdict[i/step,p]
SimWorker:
def SimWorker(start, stop, pop, "a bunch of other arguments"):
"""
Run a batch of sims
"""
for p in range(start, stop):
fitness[p] = Sim_T(pop[p],"a bunch of other arguments")
return(fitness)