0

I want to implement a recursive parallel algorithm and I want a pool to be created only once and each time step do a job wait for all the jobs to finish and then call the processes again with inputs the previous outputs and then again the same at the next time step, etc.

My problem is that I have implemented a version where every time step I create and kill the pool, but this is extremely slow, even slower than the sequential version. When I try to implement a version where the pool is created only once at the beginning I got assertion error when I try to call join().

This is my code

def log_result(result):

    tempx , tempb, u = result

    X[:,u,np.newaxis], b[:,u,np.newaxis] = tempx , tempb


workers =  mp.Pool(processes = 4) 
for t in range(p,T):

    count = 0 #==========This is only master's job=============
    for l in range(p):
        for k in range(4):
            gn[count]=train[t-l-1,k]
            count+=1
    G = G*v +  gn @ gn.T#==================================

    if __name__ == '__main__':
        for i in range(4):
            workers.apply_async(OULtraining, args=(train[t,i], X[:,i,np.newaxis], b[:,i,np.newaxis], i, gn), callback = log_result)


        workers.join()   

X and b are the matrices that I want to update directly at the master's memory.

What is wrong here and I get the assertion error?

Can I implement with the pool what I want or not?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Bekromoularo
  • 99
  • 11

1 Answers1

1

You cannot join a pool that is not closed first, as join() will wait worker processes to terminate, not jobs to complete (https://docs.python.org/3.6/library/multiprocessing.html section 17.2.2.9).

But as this will close the pool, which is not what you want, you cannot use this. So join is out, and you need to implement a "wait until all jobs completed" by yourself.

One way of doing this without busy loops would be using a queue. You could also work with bounded semaphores, but they do not work on all operating systems.

counter = 0
lock_queue = multiprocessing.Queue()
counter_lock = multiprocessing.Lock()

def log_result(result):

    tempx , tempb, u = result

    X[:,u,np.newaxis], b[:,u,np.newaxis] = tempx , tempb
    with counter_lock:
        counter += 1
        if counter == 4:
            counter = 0
            lock_queue.put(42)



workers =  mp.Pool(processes = 4) 
for t in range(p,T):

    count = 0 #==========This is only master's job=============
    for l in range(p):
        for k in range(4):
            gn[count]=train[t-l-1,k]
            count+=1
    G = G*v +  gn @ gn.T#==================================

    if __name__ == '__main__':
        counter = 0
        for i in range(4):
            workers.apply_async(OULtraining, args=(train[t,i], X[:,i,np.newaxis], b[:,i,np.newaxis], i, gn), callback = log_result)


        lock_queue.get(block=True)

This resets a global counter before submitting jobs. As soon as a job is completed, you callback increments a global counter. When the counter hits 4 (your number of jobs), the callback knows it has processed the last result. Then a dummy message is sent in a queue. Your main program is waiting at Queue.get() for something to appear there.

This allows your main program to block until all jobs have completed, without closing down the pool.

If you replace multiprocessing.Pool with ProcessPoolExecutor from concurrent.futures, you can skip this part and use

concurrent.futures.wait(fs, timeout=None, return_when=ALL_COMPLETED)

to block until all submitted tasks have finished. From functional standpoint there is no difference between these. The concurrent.futures method is a couple of lines shorter but the result is exactly the same.

Hannu
  • 11,685
  • 4
  • 35
  • 51
  • Do you know a way to debug the code? Cause it is still wrong even with your update. The weird think is that if I close, join in every time step, it works properly even with the update. And still does not work properly for a global pool. What could be the error, can you guess something? – Bekromoularo Mar 13 '18 at 14:44
  • I would just add print statements. I'd start with printing counter in your log_results just after it has been incremented. When you say "wrong", what do you mean by it? Do you get an exception or does it do something it is not supposed to do? – Hannu Mar 13 '18 at 14:50
  • Just the result is not the expected. But when I create new pool every time-step it is. I just change where the pool is created, nothing more. That is why I say it is weird – Bekromoularo Mar 13 '18 at 14:55
  • One thing you could try to narrow down the issue is to set `maxtasksperchild=1` in your pool. This is not what you want to end up with, as it will exit and recreate workers after each task but it might narrow down the issue. I also notice you are modifying variables that look global in your threads, but you do not use any locks. Another possibility is that your assignment statements to X creates a local copy of the object and you keep then working on this. Just guesses as I am not familiar with numpy and the matrix operations. – Hannu Mar 13 '18 at 15:46
  • I do not use lock when I change because, A process updates only one row of a matrix. So the 4 processes are updating 4 different things. – Bekromoularo Mar 13 '18 at 16:29
  • Are you sure it is safe? It may be, but if Python/numpy do internally something invisible (for example create a copy, make your modifications to the copy, write copy back to original memory segment), this would not be thread safe even though you are intentionally only modifying a subset of the array. But this is just a thought. – Hannu Mar 14 '18 at 09:22
  • 1
    You could try adding locks and see if it solves your problem. Another check to add would be to put `print (id(X), id(b))` to your worker as first and last lines to ensure your global variables remain the same and they are not given local incarnations of the same name. Sorry I can't be of more assistance. – Hannu Mar 14 '18 at 09:23
  • I tried to print the id as you told me. And when I use the global pool(as I want to do) there is a problem with the id(b). So there is the mistake. But using locks does not change anything – Bekromoularo Mar 14 '18 at 10:43
  • As I said, I do not know how the internals work there but what happens is that your b becomes a local copy. I won't be able to help much more but at least now you know what you are dealing with. You could try something like this: https://stackoverflow.com/questions/1540049/replace-values-in-list-using-python where you assign values to b one by one in a loop and see if it helps. – Hannu Mar 14 '18 at 10:47
  • Also, think what is the difference with definitions of X and b if X works and b does not. Are they defined in the same part of the code? Are they objects of the same kind? Etc. – Hannu Mar 14 '18 at 10:48
  • b and X are defined at the same part of the code and they are just 2D float matrices. In every update b is updated by a new observation and X is updated accordingly to the b update. When the program is finished b is the correct b and X is something really wrong. The thing is that the local copy does not affect b at the end and affects only X. this is really strange. And the other thing that it is even more strange is that if I do not use the global pool nothing goes wrong. Anyways thank you for your valuable help. – Bekromoularo Mar 14 '18 at 10:55
  • I tried the ProcessPoolExecutor too but it seems that there is the same problem. I updated my code so I do not update any global varable. But again the same thing happens. Do you have any clue why it works with the pool being created and closed at every time-step and not with a global pool waiting for jobs to come? Is the first way still parallel? It is really frustrating. – Bekromoularo Mar 15 '18 at 10:20