6

I want to input text to python and process it in parallel. For that purpose I use multiprocessing.Pool. The problem is that sometime, not always, I have to input text multiple times before anything is processed.

This is a minimal version of my code to reproduce the problem:

import multiprocessing as mp
import time

def do_something(text):
    print('Out: ' + text, flush=True)
    # do some awesome stuff here

if __name__ == '__main__':
    p = None
    while True:
        message = input('In: ')
        if not p:
            p = mp.Pool()
        p.apply_async(do_something, (message,))

What happens is that I have to input text multiple times before I get a result, no matter how long I wait after I have inputted something the first time. (As stated above, that does not happen every time.)

python3 test.py
In: a
In: a
In: a
In: Out: a
Out: a
Out: a

If I create the pool before the while loop or if I add time.sleep(1) after creating the pool, it seems to work every time. Note: I do not want to create the pool before I get an input.

Has someone an explanation for this behavior?

I'm running Windows 10 with Python 3.4.2 EDIT: Same behavior with Python 3.5.1


EDIT:

An even simpler example with Pool and also ProcessPoolExecutor. I think the problem is the call to input() right after appyling/submitting, which only seems to be a problem the first time appyling/submitting something.

import concurrent.futures
import multiprocessing as mp
import time

def do_something(text):
    print('Out: ' + text, flush=True)
    # do some awesome stuff here

# ProcessPoolExecutor
# if __name__ == '__main__':
#     with concurrent.futures.ProcessPoolExecutor() as executor:
#         executor.submit(do_something, 'a')
#         input('In:')
#         print('done')

# Pool
if __name__ == '__main__':
    p = mp.Pool()
    p.apply_async(do_something, ('a',))
    input('In:')
    p.close()
    p.join()
    print('done')
the
  • 361
  • 3
  • 9
  • 1
    Interestingly, on my Linux system it seems to always start processing input immediately, as expected: https://asciinema.org/a/4rhu9ibapsq8aalnj6z5ncugb I wonder if maybe the `flush=True` isn't actually causing output to be flushed? That would be easy to test (create a unique file per invocation of `do_something`, for example). – larsks Apr 28 '16 at 15:59
  • @larsks: Just tried it with creating files. It produces the same behavior. – the Apr 28 '16 at 16:09
  • @skrrgwasme: The pool is created only once within the loop. The code I really use joins and closes the pool if EOF is read. – the Apr 28 '16 at 16:11
  • 1
    @the Are you ever *missing* the output of your first input, or does it just come late? I just entered "a", "b", and "c", and they indeed did appear late, but I saw all three eventually appear. Did any of your input fail to appear entirely? – skrrgwasme Apr 28 '16 at 16:15
  • @skrrgwasme: yes, the output appears late, but I'm not missing anything. Actually after the first time it returns something, it works as expected and returns everything immediately. – the Apr 28 '16 at 16:32
  • @the That's what I'm seeing too. Also, I replaced the printing with creating and opening a text file, and nothing is created until the third iteration. It's definitely not an output flushing issue - the function execution is being delayed. – skrrgwasme Apr 28 '16 at 16:35
  • @the This issue is *not* present on Python2.7.10. I'm very interested... – skrrgwasme Apr 28 '16 at 16:47

2 Answers2

0

Your code works when I tried it on my Mac.

In Python 3, it might help to explicitly declare how many processors will be in your pool (ie the number of simultaneous processes).

try using p = mp.Pool(1)

import multiprocessing as mp
import time

def do_something(text):
    print('Out: ' + text, flush=True)
    # do some awesome stuff here

if __name__ == '__main__':
    p = None
    while True:
        message = input('In: ')
        if not p:
            p = mp.Pool(1)
        p.apply_async(do_something, (message,))
Eli Korvigo
  • 10,265
  • 6
  • 47
  • 73
jberry
  • 1
  • 1
0

I could not reproduce it on Windows 7 but there are few long shots worth to mention for your issue.

  1. your AV might be interfering with the newly spawned processes, try temporarily disabling it and see if the issue is still present.
  2. Win 10 might have different IO caching algorithm, try inputting larger strings. If it works, it means that the OS tries to be smart and sends data when a certain amount has piled up.
  3. As Windows has no fork() primitive, you might see the delay caused by the spawn starting method.
  4. Python 3 added a new pool of workers called ProcessPoolExecutor, I'd recommend you to use this no matter the issue you suffer from.
noxdafox
  • 14,439
  • 4
  • 33
  • 45
  • 1) AV? Anti Virus? I don't have anything running 2) Normally I use string with a length beyond 1000. It still shows the same problem. 4) Results in the same behavior – the Apr 29 '16 at 09:27