1

I'm testing some functionalities of ipython and I'm think I'm doing something wrong.

I'm testing 3 different ways to execute some math operation.

  • 1st using @parallel.parallel(view=dview, block=True) and function map
  • 2nd using single core function (python normal function)
  • 3rd using clients load balance function

I have this code:

from IPython import parallel
import numpy as np
import multiprocessing as mp
import time

rc = parallel.Client(block=True)
dview = rc[:]
lbview = rc.load_balanced_view()

@parallel.require(np)
def suma_pll(a, b):
    return a + b

@parallel.require(np)
def producto_pll(a, b):
    return a * b

def suma(a, b):
    return a + b

def producto(a, b):
    return a * b

@parallel.parallel(view=dview, block=True)
@parallel.require(np)
@parallel.require(suma_pll)
@parallel.require(producto_pll)
def a_calc_pll(a, b):
    result = []
    for i, v in enumerate(a):
        result.append(
            producto_pll(suma_pll(a[i], a[i]), suma_pll(b[i], b[i]))//100
        )
    return result

@parallel.require(suma)
@parallel.require(producto)
def a_calc_remote(a, b):
    result = []
    for i, v in enumerate(a):
        result.append(
            producto(suma(a[i], a[i]), suma(b[i], b[i]))//100
        )
    return result

def a_calc(a, b):
    return producto(suma(a, a), suma(b, b))//100

def main_pll(a, b):
    return a_calc_pll.map(a, b)

def main_lb(a, b):
    c = lbview.map(a_calc_remote, a, b, block=True)
    return c

def main(a, b):
    c = []
    for i in range(len(a)):
        c += [a_calc(a[i], b[i]).tolist()]
    return c

if __name__ == '__main__':
    a, b = [], []

    for i in range(1, 1000):
        a.append(np.array(range(i+00, i+10)))
        b.append(np.array(range(i+10, i+20)))

    t = time.time()
    c1 = main_pll(a, b)
    t1 = time.time()-t

    t = time.time()
    c2 = main(a, b)
    t2 = time.time()-t

    t = time.time()
    c3 = main_lb(a, b)
    t3 = time.time()-t    

    print(str(c1) == str(c2))
    print(str(c3) == str(c2))
    print('%f secs (multicore)' % t1)
    print('%f secs (singlecore)' % t2)
    print('%f secs (multicore_load_balance)' % t3)

My result is:

True
True
0.040741 secs (multicore)
0.004004 secs (singlecore)
1.286592 secs (multicore_load_balance)

Why are my multicore routines slower than my single core routine? What is wrong with this approach? What can I do to fix it?

Some information: python3.4.1, ipython 2.2.0, numpy 1.9.0, ipcluster starting 8 Engines with LocalEngineSetLauncher

Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
xmn
  • 21
  • 3

1 Answers1

1

It seems to me that you are trying to parallelise something that takes too little time to execute on a single core. In Python, any form of "true" parallelism is multi-process, which means that you have to spawn multiple Python interpreters, transfer the data via pickling/unpickling, etc.

This is going to result in a noticeable overhead for small workloads. On my system, just starting and then stopping immediately a Python interpreter takes around 1/100 of a second:

# time python -c "pass"

real    0m0.018s
user    0m0.012s
sys     0m0.005s

I am not sure what the decorators you are using are doing behind the scenes, but as you can see just setting up the infrastructure for parallel work can take quite a bit of time.

edit

On further inspection, it looks like you are already setting up the workers before running your code, so the overhead hinted above might be out of the picture.

You are though moving data to the worker processes, two lists of 1000 NumPy arrays. Pickling a and b to a string on my system takes ~0.13 seconds with pickle and ~0.046 seconds with cPickle. The pickling time can be reduced by storing your arrays in, instead of lists, NumPy arrays:

a = np.array(a)
b = np.array(b)

This cuts down the cPickle time to ~0.029 seconds.

bluescarni
  • 3,937
  • 1
  • 22
  • 33
  • Using this approach `(a = np.array(a); b = np.array(b)))`, my result is: True True 0.043795 secs (multicore) 0.010011 secs (singlecore) 1.541814 secs (multicore_load_balance) The single core function increased its time, but the other function didn't. – xmn Sep 26 '14 at 16:42