python multiprocessing slower than normal - calculation too trivial?

Question

I dont have much experience with parallel processing in python. I have a script that takes in several fits files, which are basically images, reads them into 3D numpy arrays and does some calculations on it. I guess the "worst" part is, that I have 2 for loops, iterating over two dimensions of the array. Inside the loop, I bascically get a list, containing the third dimension of the numpy array at the given x and y coordinate. Then I calculate the maximum value, and at which index the maximum value lies. I then write the results into two new, 2D arrays at the same x and y coordinates.

for a fits file with the dimension of about 6000x6000x20, this can take a couple of minutes to finish. I then tried to have this run in parallel, since every 2D line of sight is independent of each other and can therefor be calculated in seperate processes.

I looked at some basic tutorials invoking multiprocessing, but each time I try it, it takes 10 times as long ... I have read here in some questions, that multiprocessing can have a lot of overhead. Is it possible that the processing time needed for the overhead is a lot longer than the actual calculation in the process, and that this is the reason for it beeing much slower?

Thx.

Here is a sample script I put together.

import numpy,time
import multiprocessing as mp

xs = 500
data = numpy.random.rand(100,xs,xs)
data2 = numpy.zeros(shape=(xs,xs))

def calculation(los):
    maxindex = numpy.argmax(los)
    return maxindex

t0 = time.time()
for x in range(xs):
    for y in range(xs):
        los = data[:,x,y]
        data2[x,y]=calculation(los)
t1 = time.time()
print t1-t0

t0 = time.time()
pool = mp.Pool(processes=4)
results = [pool.apply_async(calculation, args=(data[:,x,y],)) for x in range(xs) for y in range(xs)]
t1 = time.time()

print t1-t0

The first version takes about 1 second, the second version 12 takes seconds on my machine.

Multiprocessing isn't a silver bullet. Can you include the relevant code? — Jason, Aug 03 '15 at 18:52
You should batch your processing data and send it over to the worker processes. WIth multiprocessing, the additional cost is in the data passing between processes. Since here you decide to have 4 processes, you could split your data in 4 and send the chunks, minimising the cost of transfert. — toine, Aug 03 '15 at 20:36
Can I not somehow have all the processes working on the same data? — Pythoneer, Aug 03 '15 at 20:38
what the pool is doing is to create a list of jobs, each containing one line of data, that will be passed to one of the 4 processes you have created. Sending the message takes time, and sending 500*500 messages takes even more time. This is why it is very slow. The best you could do is to reduce the number of messages to be sent, e.g just send 4 since you have 4 processes. — toine, Aug 03 '15 at 20:53
did the answer work for you? if so, can you please accept it? — toine, Aug 05 '15 at 16:37

toine · Answer 1 · 2015-08-03T20:57:27.467

you could send batches of data, since the message passing is the costly part:

import numpy,time
import multiprocessing as mp

xs = 500
data = numpy.random.rand(100,xs,xs)
data2 = numpy.zeros(shape=(xs,xs))

def calculation(los):
    maxindex = numpy.argmax(los)
    return maxindex

def calculation_batch(los):
    maxindex = []
    for l in los:
        maxindex.append(umpy.argmax(los))
    return maxindex


t0 = time.time()
for x in range(xs):
    for y in range(xs):
        los = data[:,x,y]
        data2[x,y]=calculation(los)
t1 = time.time()
print t1-t0

t0 = time.time()
pool = mp.Pool(processes=4)
results = [pool.apply_async(calculation, args=(data[:,x:x+250,y:y+250],)) for x in [0, 250] for y in [0, 250]]
t1 = time.time()

print t1-t0

this gives me:

0.787902832031
0.846422195435

python multiprocessing slower than normal - calculation too trivial?

1 Answers1