My attempt to speed up one of my applications using Multiprocessing resulted in lower performance. I am sure it is a design flaw, but that is the point of discussion- How to better approach this problem in order to take advantage of multiprocessing.
My current results on a 1.4ghz atom:
- SP Version = 19 seconds
- MP Version = 24 seconds
Both versions of code can be copied and pasted for you to review. The dataset is at the bottom and can be pasted also. (I decided against using xrange to illustrate the problem)
First the SP version:
*PASTE DATA HERE*
def calc():
for i, valD1 in enumerate(D1):
for i, valD2 in enumerate(D2):
for i, valD3 in enumerate(D3):
for i, valD4 in enumerate(D4):
for i, valD5 in enumerate(D5):
for i, valD6 in enumerate(D6):
for i, valD7 in enumerate(D7):
sol1=float(valD1[1]+valD2[1]+valD3[1]+valD4[1]+valD5[1]+valD6[1]+valD7[1])
sol2=float(valD1[2]+valD2[2]+valD3[2]+valD4[2]+valD5[2]+valD6[2]+valD7[2])
return None
print(calc())
Now the MP version:
import multiprocessing
import itertools
*PASTE DATA HERE*
def calculate(vals):
sol1=float(valD1[0]+valD2[0]+valD3[0]+valD4[0]+valD5[0]+valD6[0]+valD7[0])
sol2=float(valD1[1]+valD2[1]+valD3[1]+valD4[1]+valD5[1]+valD6[1]+valD7[1])
return none
def process():
pool = multiprocessing.Pool(processes=4)
prod = itertools.product(([x[1],x[2]] for x in D1), ([x[1],x[2]] for x in D2), ([x[1],x[2]] for x in D3), ([x[1],x[2]] for x in D4), ([x[1],x[2]] for x in D5), ([x[1],x[2]] for x in D6), ([x[1],x[2]] for x in D7))
result = pool.imap(calculate, prod, chunksize=2500)
pool.close()
pool.join()
return result
if __name__ == "__main__":
print(process())
And the data for both:
D1 = [['A',7,4],['B',3,7],['C',6,1],['D',12,6],['E',4,8],['F',8,7],['G',11,3],['AX',11,7],['AX',11,2],['AX',11,4],['AX',11,4]]
D2 = [['A',7,4],['B',3,7],['C',6,1],['D',12,6],['E',4,8],['F',8,7],['G',11,3],['AX',11,7],['AX',11,2],['AX',11,4],['AX',11,4]]
D3 = [['A',7,4],['B',3,7],['C',6,1],['D',12,6],['E',4,8],['F',8,7],['G',11,3],['AX',11,7],['AX',11,2],['AX',11,4],['AX',11,4]]
D4 = [['A',7,4],['B',3,7],['C',6,1],['D',12,6],['E',4,8],['F',8,7],['G',11,3],['AX',11,7],['AX',11,2],['AX',11,4],['AX',11,4]]
D5 = [['A',7,4],['B',3,7],['C',6,1],['D',12,6],['E',4,8],['F',8,7],['G',11,3],['AX',11,7],['AX',11,2],['AX',11,4],['AX',11,4]]
D6 = [['A',7,4],['B',3,7],['C',6,1],['D',12,6],['E',4,8],['F',8,7],['G',11,3],['AX',11,7],['AX',11,2],['AX',11,4],['AX',11,4]]
D7 = [['A',7,4],['B',3,7],['C',6,1],['D',12,6],['E',4,8],['F',8,7],['G',11,3],['AX',11,7],['AX',11,2],['AX',11,4],['AX',11,4]]
And now the theory:
Since there is little actual work (just summing 7 ints) there is too much CPU bound data and Interprocess Communication creates too much overhead to make Multiprocessing effective. This seems like a situation where I really need the ability to multithread. So at this point I am looking for suggestions before I try this on a different language because of the GIL.
********Debugging
File "calc.py", line 309, in <module>
smart_calc()
File "calc.py", line 290, in smart_calc
results = pool.map(func, chunk_list)
File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 250, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 554, in get
raise self._value
TypeError: sequence index must be integer, not 'slice'
In this case, totallen = 108 and CHUNKS is set to 2. When CHUNKS is reduced to 1, it works.