Python multiprocessing.Manager shared data and CLI pipe

Question

I've narrowed a problem I'm having down to this simple example.

This is test.py:

#!/usr/bin/env python3

import sys
import multiprocessing

_THREADS = 4

#------------------------------------------------------------------------------
def _process( work ) :
    index, shared_data = work
    print( "Thread", index )

#------------------------------------------------------------------------------
if __name__ == "__main__" :

    # Pipe output for this works.
    #shared_data = [ 1, 2, 3, 4 ]

    # Pipe output for this does not work.
    manager = multiprocessing.Manager()
    shared_data = manager.list( [ 1, 2, 3, 4 ] )

    work = [ ( index, shared_data ) for index in range( _THREADS ) ]

    print( "Processing..." )

    with multiprocessing.Pool( _THREADS ) as pool :
        pool.map( _process, work )

    print( "Done." )

Now if I run test.py and pipe the output, I don't get any output from the threaded section print statements.

$ ./test.py | cat -
Processing...
Done.

If I remove the shared data using multiprocessing.Manager, I get the expected output:

$ ./test.py | cat -
Processing...
Thread 0
Thread 1
Thread 3
Thread 2
Done.

Both produce the correct output if no pipe is used. I'm not sure why using multiprocessing.Manager with shared data causes this. Is there something I am missing?

EDIT: A suggestion to flush the print statement did correct this problem. So technically this problem is solved. Anyone have an idea about why, only when shared data is used, the print statements in processes are no longer getting flushed?

#------------------------------------------------------------------------------
def _process( work ) :
    index, shared_data = work
    print( "Thread", index, flush=True )

You'll confuse yourself if you refer to processes as threads! — Mark Setchell, Mar 07 '22 at 15:57
@MarkSetchell Yeah, I'm not good about using those terms correctly with Python. — A. Que, Mar 07 '22 at 16:41
You could try to add `sys.stdout.flush()` at the end of the `_process` function. — Timus, Mar 07 '22 at 16:53
@Timus Wasn't expecting that to work, but it did. Why `stdout` isn't flushed when the process ends only when shared memory is used is still a mystery, but doing it manually does produce the output. Confused, but I'll take the working results. — A. Que, Mar 07 '22 at 17:44
To make it even stranger: You don't even have to use the `shared_data` (remove them from `_process` and `work`), only produce them (`shared_data = manager.list( [ 1, 2, 3, 4 ] )`), to get this behaviour most of the time - not always, though. — Timus, Mar 07 '22 at 19:36

Python multiprocessing.Manager shared data and CLI pipe

0 Answers0