Ordered status printing in multiprocessing

Question

i have a program where i have to print to user after each successful step my program do, i have tried using Lock but it really slow down my program.

Basically what i want to do is to print to user (ordered) each a succesful operation, like i have some code that perform post to certain pages and i print that operation X has done in ordered way

An example showing what im trying to do: (it seem to work) but it really slow down the task:

lock = Lock()

def run(u):
    lock.acquire()
    buff = []
    e = requests.post('#personal_url', data=mydata)
    buff.append(e.text)
    buff.append('----------------')
    f = requests.get('#personal_urlx')
    buff.append(u + ' --> ' f.text)
    print('\n'.join(buff))
    lock.release()

p = Pool(4)
p.map(run, uu)
p.close()
p.join()

I will really appreciate any help, thanks.

Is the accepted answer correct in that you want output collated by “user” (which is hinted at only by the `u` and `uu` names)? — Davis Herring, Mar 11 '18 at 18:43
u stand for url uu = urls i selected it as correct because no one answered — cojiko, Mar 11 '18 at 18:53
I'm just curious, how else would one go about ordering the outputs of asynchronous calls other than waiting for them and collating them at the end? — postmalloc, Mar 11 '18 at 19:06
@segfaux i don't wanna print them all in once, i want to print one by one, is that possible? — cojiko, Mar 11 '18 at 19:20
Say, you start printing after each successful operation for each user. The first task for the first user completes, you print. Next, the first task for the second user completes. Should you print? If you do, do you expect to erase the output if in the next step, second task of the first user completes? Do you reprint everything again in order at that point in time? Perhaps I'm having difficulty imagining the exact behavior that you are expecting. — postmalloc, Mar 11 '18 at 19:34

0x51ba · Answer 1 · 2018-03-11T21:02:25.750

What is probably slowing down your program is your locking strategy. Locks should be used only to protect so called critical sctions of code which contain shared resources that could contain invalid states if not protected correctly.

So my suggestion is that if your only concern is to have valid outputs on your stdout (meaning that your prints are not interrupted and full lines are printed) try to protect your stdout by writing kind of an extended print function and use your lock only there. Something like this:

def ext_print(str, lock):
    lock.acquire()
    print(str)
    lock.release()

From your current code, please remove the operations on the lock and use the locking only inside the ext_print function.

def run(u):
    buff = []
    e = requests.post('#personal_url', data=mydata)
    buff.append(e.text)
    buff.append('----------------')
    f = requests.get('#personal_urlx')
    buff.append(u + ' --> ' f.text)
    ext_print('\n'.join(buff), lock)

Using this approach you should get clean outputs on your stdout. Be aware that using this approach the output could be written to stdout with a delay meaning that having two threads t1 and t2, it is possible that you get the outputs of t2 which had been started later than t1 before the outputs of t1 even though t1 was done processing the data before t2. So this approach will improve performance and the performance gain achieved by multithreading but it does not guarantee that the outputs reflect the exact same order of finished get and set operations.

I think the only way to really write the outputs in the same order as each one of the finished operations is to go with a solution like this:

def run(u):
    buff = []

    lock.acquire()
    e = requests.post('#personal_url', data=mydata)
    print(e.text)
    print('----------------')
    lock.release()

    lock.acquire()
    f = requests.get('#personal_urlx')
    print(u + ' --> ' f.text)
    print('----------------')
    lock.release()

As you can guess the performance of this one will probably be worse.

postmalloc · Accepted Answer · 2018-03-12T03:03:50.640

UPDATED

I've made some changes to the code after reading your comment. Probably not a great way to do it, but basically, I fork out another process that will poll the shared dictionary at some interval and update the console output. Note that this will clear the entire console while it updates. Hopefully this is the expected behavior.

The code:

from multiprocessing import Lock, Process, Pool, Manager
import time

def run(user,logs):
    logs[user] += ['Message 1 for user: ' + user]
    time.sleep(2) #some work done per user
    logs[user] += ['Message 2 for user: ' + user]
    return 1

manager = Manager()
logs = manager.dict()

users = ['Tom', 'Bob', 'Dinesh', 'Ravi']

for user in users:
    logs[user] = [] #initialize empty list for each user

logs_list = [logs for i in range(len(users))] 

def poll(logs):
    while True:
        print("\033c") #clear the console
        for user in logs.keys():
            print('Logs for user:', user)
            print('\n'.join(logs[user]))
            print('----------------')
        time.sleep(0.1)

poller_process = Process(target=poll, args=(logs,))

poller_process.start()
p = Pool(4)
p.starmap(run, zip(users,logs_list))
p.close()
p.join()
poller_process.join()

------
Output #logs under each user are refreshed constantly
------
Logs for user: Tom
Message 1 for user: Tom
Message 2 for user: Tom
----------------
Logs for user: Bob
Message 1 for user: Bob
Message 2 for user: Bob
----------------
Logs for user: Dinesh
Message 1 for user: Dinesh
Message 2 for user: Dinesh
----------------
Logs for user: Ravi
Message 1 for user: Ravi
Message 2 for user: Ravi
----------------

This may not be a very elegant approach, but it works. You can try aggregating the results from each process against a 'user' key in a shared dictionary. You can then iterate over the dictionary after the pool.join() and print all the results in order. This will elimate the need for the locks.

The code looks something like this:

from multiprocessing import Lock, Process, Pool, Manager
import time

def run(user,logs):
    logs[user] += ['Message 1 for user: ' + user]
    time.sleep(1) #some work done per user
    logs[user] += ['Message 2 for user: ' + user]
    return 1

manager = Manager()
logs = manager.dict()

users = ['Tom', 'Bob', 'Dinesh', 'Ravi']

for user in users:
    logs[user] = [] #initialize empty list for each user

logs_list = [logs for i in range(len(users))] 

p = Pool(4)
p.starmap(run, zip(users,logs_list))
p.close()
p.join()

for user in logs.keys():
    print(logs[user])


------
Output:  
------
['Message 1 for user: Tom', 'Message 2 for user: Tom']
['Message 1 for user: Bob', 'Message 2 for user: Bob']
['Message 1 for user: Dinesh', 'Message 2 for user: Dinesh']
['Message 1 for user: Ravi', 'Message 2 for user: Ravi']

Ordered status printing in multiprocessing

2 Answers2