1

I have a python program designed to evolve a 3d model which is CFD analyzed in OpenFOAM. The analysis is conducted in parallel with a program called "mpirun"; my python script runs mpirun via subprocess.Popen. Nothing unusual so far. What is unusual is that when mpirun encounters an error with one of its children and kills its children, then prints an error... then the python parent process freezes. And doesn't freeze at some obvious place, like reading from the pipe... at random locations, it just stops... doing anything.

I tried running my program with "python3 -m trace --trace" to see what line things are stopping on, here's the final output:

foam.py(1765):       print("-B")
-B
foam.py(1766):       if match:
foam.py(1776):       print("-A")
-A
foam.py(1777):       if re.match(" *Sum of moments *", line_text):
 --- modulename: re, funcname: match
re.py(163):     return _compile(pattern, flags).match(string)
 --- modulename: re, funcname: _compile
re.py(280):     try:
re.py(281):         p, loc = _cache[type(pattern), pattern, flags]
re.py(282):         if loc is None or loc == _locale.setlocale(_locale.LC_CTYPE):
re.py(283):             return p
foam.py(1780):       print("A")
A
foam.py(1781):       if force_mode:

As you can see, it gets up to "if force_mode:".... and then just stops. Obviously "if bool" should never hang. I've been working on trying to figure this out for several days and I'm no closer to an answer.

It doesn't seem to make a difference how I start the process via subprocess.Popen - shell=True, shell=False, running "mpirun" directly, running it through a bash wrapper script... nothing matters (the only thing I've kept consistent is stdout=subprocess.PIPE, since I have to be able to read the output). As soon as one of mpirun's children dies and it reports its error, foam.py just hangs.

Any clue what might be going on here? I'm stumped. :(

KarenRei
  • 589
  • 6
  • 13
  • Perhaps its stuck on next line? Ehat should execute next? – Eduard Sep 23 '18 at 21:56
  • From `python3 -m trace --help`: ```-t, --trace Print each line to sys.stdout before it is executed``` – KevinOrr Sep 23 '18 at 22:06
  • One way to debug this might be to use `strace` program, possibly with the `-f` and/or `-b execve` flags. This will allow you to see what system calls the python process is making around the time it hangs. – KevinOrr Sep 23 '18 at 22:11
  • where is the code that calls `subprocess`? I guess you're leaving some pipes open in your process and they are leaking to the subprocess via fork – nosklo Sep 23 '18 at 22:15
  • Strace without any flags: Wow ,it seems to baffle even strace. The output ends with: "read(3,". It doesn't even finish writing out the read command! – KarenRei Sep 23 '18 at 22:27
  • "where is the code that calls subprocess" - as mentioned I've gone through several while trying to see if I could find any that doesnt hang (no luck with that). Current command is: p = subprocess.Popen(("mpirun -hostfile system/hostfile -np %d rhoPimpleFoam -parallel" % PROCESSORS).split(" "), stdout=subprocess.PIPE, shell=False). What do you mean about pipes "leaking"? – KarenRei Sep 23 '18 at 22:30
  • ED: It's possible that the strace-not-finishing-writing might be due to "tee" - will try re-running it without tee. – KarenRei Sep 23 '18 at 22:36
  • Okay, now I'm beginning to suspect that tee was throwing me off, not showing the true actual place where the program was hanging, due to its buffering of output text. Now it's looking like it's actually hanging when trying to read from the pipe. Which could actually be reasonable! – KarenRei Sep 23 '18 at 22:46

1 Answers1

0

Answer: As per above, the fact that the output of the program was being run through "tee" to log it so that I could examine all of the messages was actually misleading me. Because tee buffers its content, I wasn't seeing the last messages to be printed. After removing tee, I was able to see that it was actually hanging on a pipe read. I was able to fix this by looking for the death messages and then calling kill on the pipe.

Thanks for the help!

KarenRei
  • 589
  • 6
  • 13