9

In my program I have this utility function for executing commands in shell, here's a simplified version of it:

def run_command(cmd):
    s = time.time()
    print('starting subprocess')
    proc = subprocess.Popen(cmd.split(),
                            stdout=subprocess.PIPE,
                            stderr=subprocess.PIPE,
                            universal_newlines=True)
    print('subprocess started after ({0}) seconds'.format(time.time() - s))

My program uses this function from different threads to execute commands. Occasionally the "Popen" line takes around 70 seconds to complete. I mean out of thousands of invocations in a day on different program runs this happens about 4-5 times. As far as I know Popen is non-blocking. What is weird to me is when it does happen it takes the same ~70 seconds to start. It's important to note that while this happen I have 3-4 other threads that are waiting in a loop:

while some_counter > 0:
    time.sleep(0.5) 

They do so for at most 60 seconds. After they give up and finish their flow I see another ~14 seconds until the "Popen" call finishes. Is there a problem running "Popen" from some threads in parallel to having other threads in a "wait loop"?

Update 1: I now I see that this problem started after I switched from Fedora27+Python3.6 to Fedora31+python3.7.

Bob Sacamano
  • 699
  • 15
  • 39
  • The usual reason for delays of this duration on random process start is botched dns/user info lookup (this happens often in all kinds of remote auth situations, such as active directory or ldap based ones). Do you see any delay when running an `id` command? – oakad Dec 23 '19 at 14:10
  • I think that the command that hangs is irrelevant as it is a different one each time. – Bob Sacamano Dec 24 '19 at 06:56
  • Not sure what are you talking about... anyway I ran the id command both directly in shell and via Popen, no delay. But it is not a good indication as I said because I get this issue on different runs on different machines and I can't reproduce it... It happens 1 in 1000(or even more). – Bob Sacamano Dec 24 '19 at 09:04
  • 1
    Have you checked your `syslog` for the time when this happens? Also, `sar` logs, if you have them. Perhaps it is not Python; perhaps it is something on the system, and you only see it on your program because you're looking (in this case, logging). – caxcaxcoatl Dec 26 '19 at 01:59
  • It almost seems like you might be using up many process ids and recycling them may force a period of non-use before creating a process with it. When you say thousands per day, what's the peak rate? – karmakaze Dec 29 '19 at 08:02
  • Umm what prints do you suggest I add to test this theory? – Bob Sacamano Dec 30 '19 at 08:35
  • Hey Bob, managed to fix this? I'm having a similar issue. – jamesfdearborn Jan 02 '20 at 22:46
  • Hey James, no. Did you managed to get a consistent reproduction? What is your environment(os, python version)? – Bob Sacamano Jan 05 '20 at 07:09
  • Hey Bob, are you using complete paths to your executables, if not what does your PATH variable look like - could be something as simple as an nfs stall as the shell searches for the command in your PATH? – Andrew Atrens Jan 08 '20 at 13:47
  • 1
    Also of possible interest - https://docs.python.org/3/library/subprocess.html .. if you read down far enough Popen child process can block on output to OS pipe buffer .. hope this helps. – Andrew Atrens Jan 08 '20 at 17:57
  • use async await with subprocess like this https://fredrikaverpil.github.io/2017/06/20/async-and-await-with-subprocesses/ – divyang4481 Jan 12 '20 at 09:12

0 Answers0