0

I have the sketch of my code as follows:

def func1(c):
    return a,b

def func2(c,x):
    if condition:
        a,b = func1(c)
        x.append(a,b)
        func2(a,x)
        func2(b,x)
    return x

x = []
y = func2(c, x)

The problem, as you might have figured out from the code, is that I would like func2(b) to be computed in parallel with func2(a) whenever condition is true i.e. before b is replace by a new b from func2(a). But according to my algorithm, this clearly can not happen due to the new b's.

I do think such a problem might be perfect for parallel computing approach. But, I did not use it before and my knowledge about that is quite limited. I did try the suggestion from How to do parallel programming in Python, though. But I got the same result like the sketch above.

Community
  • 1
  • 1
some1
  • 5
  • 3
  • Parallelizing this looks difficult because you don't want to spawn new threads for *every* recursive call -- if the recursion is deep you'd use tons of threads (way more than # CPUs) and none would be fast. Also I'm confused when you say "b is replaced by a new b". Each stack frame (recursive call) gets it's own `b` in the `a, b =` statement -- whereas the `x` list is the same globally, so all recursive calls will append to the same `x`. – DouglasDD Apr 21 '14 at 20:07
  • I see. Yeah, what I meant by new `b` is the following: when condition is true `a_0,b_0 = func1(c)`, then `func2(a_0,x)` if condition is again true, `a1,b1 = func1(a_0)` .... It's after completing this that it goes to `func2(b,x)`. Therefore the only chance it has is to compute only `func2(b_k,x)` where `b_k` is the last `b` by `func2(a,x)`. But, I would like every `func2(b,x)` to be computed for each `b`'s i.e. `b_0, ..., b_{k-1}`. I am not sure if I explained the issue well. – some1 Apr 21 '14 at 20:45

2 Answers2

0

Caveat: Threading might not be parallel enough for you (see https://docs.python.org/2/library/threading.html note on the Global Interpreter Lock) so you might have to use the multiprocessing library instead (https://docs.python.org/2/library/multiprocessing.html).

...So I've cheated/been-lazy & used a thread/process neutral term "job". You'll need to pick either threading or multiprocessing for everywhere that I use "job".

def func1(c):
    return a,b

def func2(c,x):
    if condition:
        a,b = func1(c)
        x.append(a,b)
        a_job = None
        if (number_active_jobs() >= NUM_CPUS):
            # do a and b sequentially
            func2(a, x)
        else:
            a_job = fork_job(func2, a, x)
        func2(b,x)
        if a_job is not None:
            join(a_job)

x = []
func2(c, x)
# all results are now in x (don't need y)

...that will be best if you need a,b pairs to finish together for some reason. If you're willing to let the scheduler go nuts, you could "job" them all & then join at the end:

def func1(c):
    return a,b

def func2(c,x):
    if condition:
        a,b = func1(c)
        x.append(a,b)
        if (number_active_jobs() >= NUM_CPUS):
            # do a and b sequentially
            func2(a, x)
        else:
            all_jobs.append(fork_job(func2, a, x))
        # TODO: the same job-or-sequential for func2(b,x)

all_jobs = []
x = []
func2(c, x)
for j in all_jobs:
    join(j)
# all results are now in x (don't need y)

The NUM_CPUS check could be done with threading.activeCount() instead of a full blown worker threa pool (python - how to get the numebr of active threads started by specific class?).

But with multiprocessing you'd have more work to do with JoinableQueue and a fixed size Pool of workers

Community
  • 1
  • 1
DouglasDD
  • 395
  • 3
  • 11
  • Thanks for detailed algos and options. Following your opinion to use multiprocessing (`import multiprocessing as mlt`), I am not able to translate the 'job'-attached variables. 1. How could I find number_active_jobs() in mlt? or are you suggesting me to use thread as well? 2. What did you mean by `a_job=None`? if it is a variable, when is it updated for the last condition to make sense? 3. I also am too inexperienced in this 'mlt' module to understand `fork_job(func2, a, x)`. I checked `mlt.forking.` but didn't know which one you were referring to. – some1 Apr 21 '14 at 23:20
  • NOTE: in order for x to be shared memory across the multiprocessing processes, `x` would need to be a `mlt.Array` instead of a simple Python list. Answers: (1) After a bit more digging it looks like a fixed size pool would be easier than trying to count the forked jobs: `pool = mlt.Pool(processes=4)` (2) Ooops. I edit the code above to add the assignment that I forgot. (3) the equivalent of fork_job(func2, a, x) with a multiprocessing Pool would be something like `pool.apply_async(func2, [a, x]) (*) Lots more detail in the examples https://docs.python.org/2/library/multiprocessing.html – DouglasDD Apr 23 '14 at 05:35
  • Hey Douglas. Thanks a lot. My fear was in fact pointless. The code by itself does the job as I wanted, just fine. But definitely, your feedback are great input for me cus my next step is just using multiple cores for which I need 'mlt' or 'th' that I have no enough knowledge of. Thanks a lot!!! – some1 Apr 28 '14 at 11:02
0

From your explanation I have a feeling that it is not that b gets updated (which is not, as DouglasDD explained), but x. To let both recursive calls to work on a same x, you need to take some sort of a snapshot of x. The simplest way is to pass an index of a newly appended tuple, along the lines of

def func2(c, x, length):
    ...
    x.append(a, b)
    func2(a, x, length + 1)
    func2(b, x, length + 1)
user58697
  • 7,808
  • 1
  • 14
  • 28
  • thanks, but DouglasDD understood my problem. Quite rankly, I also thought there might be a simple trick by some identifier to switch parameters for func2, similar to your opinion. I am not sure, though, if yours would work for me. Because a,b = func1(c) i.e. there is another function func1 to be called within func2. – some1 Apr 22 '14 at 13:36