Real-time piping with python subprocess.check_output or workaround?

Question

Part of a large project I have includes a section, in python, like this:

failcount = 0
done = False
while done == False:
    try:
        result = subprocess.check_output(program)
        done = True
    except subprocess.CalledProcessError as e:
        failcount += 1
        logwrite('logfile.txt', 'Failed. Counter = {0}\nError message: {1}\n-'.format(failcount, e.returncode))
        if failcount == 20:
            print 'It failed 20 times, aborting...'
            quit()

What this is meant to do is run "program" from the command line. "program" is a large computational chemistry package which fails sometimes, so I run it in a loop here. If it fails 20 times, then my python script terminates. This works just fine and it does what is intended. However, my issue is that my chemistry package takes about three hours for each attempt and I want to monitor it as it's going.

If I run it from the command line manually I can simply do "program > logfile" and then tail -f the logfile to watch it go. However it seems you can't do something in python like:

subprocess.check_output(['program', '>', 'logfile'])

Is there a way to have python print out the contents of subprocess.check_output as it is being filled? I think subprocess.check_output just contains whatever is in stdout. Can I clone it between python and a pipe somehow?

Possible workaround: I made a bash script called run_program.sh which just does program > logfile as I listed above, and then I used python's subprocess to execute run_program.sh. This way I can monitor is as desired but now the output of program is in a file, instead of in python, so I would have to have python read a large logfile, and capture error messages if needed, so I would prefer if I could avoid something like this.

BingsF · Accepted Answer · 2018-02-15T23:04:32.117

2

Instead of using subprocess.check_output, you can use subprocess.Popen. This object represents your subprocess, and has stdout and stderr attributes that you can read. If your subprocess only uses stdout, you can probably just call Popen.stdout.readline() in a loop. However, if the subprocess writes to other pipes, you may hit a deadlock (see the documentation for details). In that case, I'd recommend the consume function described at http://stefaanlippens.net/python-asynchronous-subprocess-pipe-reading/ , which safely lets you print stdout and stderr line-by-line as it is output from the subprocess.

Alternatively, your approach using subprocess.check_output(['program', '>', 'logfile']) should work if you pass shell=True to the check_output function. The > is a shell directive that isn't recognized if you run it as a standalone command.

EDIT: The above won't return any output for your Python program to use. Instead, subprocess.check_output('program | tee logfile', shell=True).

If using shell=True, be careful that you have full control over the argument to check_output. For security, never allow any user or network input to be passed to the shell. See this warning for why.

edited Feb 15 '18 at 23:04

answered Feb 15 '18 at 20:29

BingsF

1,269
10
15

To clarify: shell redirection and piping are handled directly by a Unix shell (in terms of underlying system calls: open(), dup(), pipe(), and so on). This is all done by a subshell prior to any calls to execve(). Python subprocess.Popen() will directly call execv() if possible but will create a shell and pass a command for parsing if shell=True. The former (default) approach is safer as there are many features of shell command parsing which an be exploited in subtle, even obscure ways. – Jim Dennis Feb 15 '18 at 20:36
You seem to be correct about result = subprocess.check_output(['program', '>', 'logfile']) working if I use shell=true; however, that leads to the problem if having to read the outfile. 'result' seems to be empty if I do that. The same thing happens if I use Popen (that's what I used in my workaround bash script). I was hoping to have the output of my program BOTH piped to a file and fed into a python string simultaneously. Maybe that's impossible, who knows. It maybe be better for me to just have python read in the logfile afterwards... – iammax Feb 15 '18 at 21:22
@iammax You can use `tee` to pipe output to a file and still send it to stdout at the same time. i.e. `program | tee logfile` will write the output from `program` to stdout and the file. – BingsF Feb 15 '18 at 22:55
a problem with that answer: If I do: result = subprocess.check_output('program | tee logfile', shell=True) and the program is killed in terminal (by using pkill program, if I see in the logfile that it's in a loop, I have to kill it), it's SUPPOSED to return a fail code. But it doesn't, and the check output does NOT go to the exception. It puts program's incomplete output into "result", which then crashes my other function which parses the result. It does report a failure as expected, however, if I leave off the | tee logfile part. Is that because of program's behavior, or tee's? – iammax Feb 20 '18 at 05:29
Adding to the previous comment: It also "fails correctly" if I simply pipe it to a file with >. But not with tee – iammax Feb 20 '18 at 05:31
@iammax That's because of the behavior of pipelines in your shell (probably bash). Normally a pipeline exits with the status of the rightmost process, which in this case is `tee`. In bash you can override that behavior with the [pipefail option](https://stackoverflow.com/questions/32684119/exit-when-one-process-in-pipe-fails). Alternatively you can manage the processes more explicitly by [implementing the pipeline in python](https://docs.python.org/2/library/subprocess.html#replacing-shell-pipeline). – BingsF Feb 20 '18 at 16:54

Real-time piping with python subprocess.check_output or workaround?

1 Answers1