Python multiprocessing.starmap Process Count & Speed

Question

I have a genetic algorithm whose fitness function is a cumbersome simulation environment. The code is absolutely CPU-bound, runs on 64bit python 3.3, so I have implemented multiprocessing.starmap_async to parallelize.

And it works very well, large increases in efficiency over serial. On my processor,Intel i7 CPU @ 2.40 GHz (with 16GB RAM): I notice run-times of 8 to 9 seconds with 4 processes (and slower for 2 processes, and even slower for serial).

HOWEVER, this only utilizes 65 to 73% of my processor.
Increasing the process count to 6 utilizes 95% to 100% of the processor, but with a runtime of 11 seconds. Memory is still sitting around 20%

Increase this count to 8 and the processor sits constantly at 100%, but runtime is now 12 seconds. Memory is just fine.

I can't post everything, but below is the multiprocessing call (with the arguments removed). Is there anything I can do to utilize more of the processor without a slow-down? I'd also appreciate any help understanding why this phenomena is occurring.

Multiprocessing call:

        step = np.int8(pop_size/4)
        pol = Pool(processes=4)

        res = pol.starmap_async(SimWorker, ((i, i+step,pop,"a bunch of other arguments") for i in range(0, pop_size, step)))
        fitnessdict = res.get()
        fitnessdict = np.asarray(fitnessdict)
        for i in range(0,pop_size,step):
            for p in range(i,i+step):
                fitness[p] = fitnessdict[i/step,p]

SimWorker:

def SimWorker(start, stop, pop, "a bunch of other arguments"):
    """
    Run a batch of sims
    """

    for p in range(start, stop):
        fitness[p] = Sim_T(pop[p],"a bunch of other arguments")
    return(fitness)

If you have hyperthreading (sounds like you do), the extra virtual threads will compete for the same physical CPU core. So on an i7 with 4 physical cores and 8 virtual cores, when you run a perfectly CPU-bound task with more than 4 cores, you incur overhead due to that competition. — mdscruggs, Jul 18 '13 at 14:05
Yes I have hyperthreading (sorry should've mentioned that). I was expecting results similar to: http://eli.thegreenplace.net/2012/01/16/python-parallelizing-cpu-bound-tasks-with-multiprocessing/ Is this a problem of Windows, and a different OS would more efficiently handle this? — Dergs McGreggin, Jul 18 '13 at 14:08
What happens if you increase the `step` size, thereby making each `SimWorker` do more work before ending? — unutbu, Jul 18 '13 at 14:16
It depends on the complexity and diversity of the CPU-bound tasks. The more purely CPU-bound and uniform the tasks are, the less benefit you get from virtual threads because the same hardware on each core is being used simultaneously a higher portion of the time. Once you pass into negative performance per process, that's a good sign, because you are getting the most you can out of your machine for that particular type of task. 100% of 8 cores (half of which are virtual) is a bit misleading if all the hardware is being used at 50%. Try turning off hyperthreading as an experiment. — mdscruggs, Jul 18 '13 at 14:16
related: http://stackoverflow.com/questions/8416370/running-simulation-with-hyperthreading-doubles-runtime — mdscruggs, Jul 18 '13 at 14:28
@unutbu Here's some more runs: 8 processes,Stepsize of pop/8 = 12.12 8 processes,Stepsize of pop/4 = 14.08 8 processes,Stepsize of pop/2 = 14.52 and 4 processes,Stepsize of pop/8 = 9.12 4 processes,Stepsize of pop/4 = 8.31 4 processes,Stepsize of pop/2 = 9.46 — Dergs McGreggin, Jul 18 '13 at 14:30
@mdscruggs The tasks are quite uniform except for one case: the controller can drastically fail, in which case the simulation ends early for that member of "pop". Thanks for advice. — Dergs McGreggin, Jul 18 '13 at 14:32
Sorry, I don't have enough reputation for a discussion, so here is my question: would it be useful to use `xrange` instead of `range` ? — Frodon, Jul 18 '13 at 14:06
In Python 3, there is no `xrange`. They made `range` function as `xrange` did in <3. — mdscruggs, Jul 18 '13 at 14:07
Thanks for the info, I still use Python 2 and I'm not aware of all the changes brought with Python 3 — Frodon, Jul 18 '13 at 14:36

Python multiprocessing.starmap Process Count & Speed

0 Answers0